This Article 
 Bibliographic References 
 Add to: 
Axiomatic Approach to Feature Subset Selection Based on Relevance
March 1999 (vol. 21 no. 3)
pp. 271-277

Abstract—Relevance has traditionally been linked with feature subset selection, but formalization of this link has not been attempted. In this paper, we propose two axioms for feature subset selection—sufficiency axiom and necessity axiom—based on which this link is formalized: The expected feature subset is the one which maximizes relevance. Finding the expected feature subset turns out to be NP-hard. We then devise a heuristic algorithm to find the expected subset which has a polynomial time complexity. The experimental results show that the algorithm finds good enough subset of features which, when presented to C4.5, results in better prediction accuracy.

[1] D.W. Aha and R.L. Bankert, "Feature Selection for Case-Based Classification of Cloud Types," Working notes of the AAAI94 Workshop on Case-Based Reasoning. AAAI Press, 1994, pp. 106-112.
[2] H. Almuallim and T.G. Dietterich, "Learning With Many Irrelevant Features," Proc. Ninth Nat'l Conf. Artificial Intelligence.Cambridge, Mass.: MIT Press, 1991, pp. 547-552.
[3] H. Almuallim and T.G. Dietterich, "Learning Boolean Concepts in the Presence of Many Irrelevant Features," Artificial Intelligence, vol. 69, nos. 1-2, pp. 279-305, Nov. 1994.
[4] B. Amirikian and H. Nishimura, "What Size Network Is Good for Generalization of a Specific Task of Interest?" Neural Networks, vol. 7, no. 2, pp. 321-329, 1994.
[5] A. Blumer, A. Ehrenfeucht, D. Haussler, and M. Warmuth, "Occam's Razor," Information Processing Letters, vol. 24, NorthHolland, pp. 377-380, 1987.
[6] R. Carnap, Logical Foundations of Probability.Chicago: The Univ. of Chicago Press, 1962.
[7] Clementine: A Data Mining System, Integral Solutions Limited, 1998, http:/
[8] T.M. Cover and J.A. Thomas, Elements of Information Theory. John Wiley&Sons, 1991.
[9] S. Davies and S. Russell, "NP-Completeness of Searches for Smallest Possible Feature Sets," Proc. 1994 AAAI Fall Symp. Relevance. AAAI Press, 1994, pp. 37-39.
[10] P.A. Devijver and J. Kittler, Pattern Recognition: A Statistical Approach.Englewood Cliffs, N.J.: Prentice Hall, 1982.
[11] I. Düntsch and G. Gediga, "Uncertainty Measures of Rough Set Prediction," Univ. of Ulster and Universität Osnabrück, 1998. To appear in Artificial Intelligence, .
[12] U. Fayyad and K. Irani, "What Should Be Minimized in a Decision Tree?" AAAI-90: Proc. Eighth Nat'l Conf. Artificial Intelligence, 1990.
[13] U. Fayyad and K. Irani, "The Attribute Selection Problem in Decision Tree Generation," AAAI-92: Proc. 10th Nat'l Conf. Artificial Intelligence, 1992.
[14] P. Gärdenfors, "On the Logic of Relevance," Synthese, vol. 37, pp. 351-367, 1978.
[15] G.H. John, R. Kohavi, and K. Pfleger, "Irrelevant Features and the Subset Selection Problem," Proc. 11th Int'l Conf. Machine Learning.San Mateo, Calif.: Morgan Kaufmann, 1994, pp. 121-129.
[16] J.M. Keynes, A Treatise on Probability.New York: Macmillan, 1921.
[17] K. Kira and L.A. Rendell, "The Feature Selection Problem: Traditional Methods and a New Algorithm," AAAI-92, pp. 129-134, 1992.
[18] R. Kohavi, "Feature Subset Selection as Search With Probabilistic Estimates," R. Greiner and D. Subramanian, eds., Relevance: Proc. 1994 AAAI Fall Symp. AAAI Press, 1994, pp. 122-126.
[19] R. Kohavi and D. Sommerfield, "Feature Subset Selection Using the Wrapper Method: Overfitting and Dynamic Search Space Topology," Proc. KDD'95, pp. 192-197,Montreal, Canada, 1995.
[20] I. Kononenko, "Estimating Attributes: Analysis and Extensions of RELIEF," Proc. 1994 European Conf. Machine Learning, 1994.
[21] P. Langley, "Selection of Relevant Features in Machine Learning," Relevance: Proc. 1994 AAAI Fall Symp. AAAI Press, 1994, pp. 127-131.
[22] A.Y. Levy, "Creating Abstractions Using Relevance Reasoning," Proc. AAAI-94, 1994.
[23] H. Liu and R. Setiono, "Feature Selection via Discretization of Numeric Attributes," IEEE Trans. Knowledge and Data Eng., vol. 9, no. 4, July/Aug. 1997.
[24] S. Muggleton, Inductive Logic Programming.New York: Academic Press, 1992.
[25] J. Pearl, Probabilistic Reasoning in Intelligent Systems. San Mateo, Calif.: Morgan Kaufman, 1988.
[26] J. Quinlan and R. Rivest, "Inferring Decision Trees Using Minimum Description Length Principle," Information and Computation, vol. 80, pp. 227-248, 1989.
[27] J.R. Quinlan, C4.5: Programs for Machine Learning,San Mateo, Calif.: Morgan Kaufman, 1992.
[28] J. Rissanen, "Stochastic Complexity and Modeling," Annals of Statistics, vol. 14, pp. 1,080-1,100, 1986.
[29] J.C. Schlimmer, "Efficiently Inducing Determinations: A Complete and Systematic Search Algorithm That Uses Optimal Pruning," ML93, pp. 284-290, 1993.
[30] H. Schweitzer, "Occam Algorithms for Computing Visual Motion," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 17, no. 11, pp. 1,033-1,042, Nov. 1995.
[31] J.E. Shore and R.W. Johnson, "Axiomatic Derivation of the Principle of Maximum Entropy and the Principle of Minimum Cross-Entropy," IEEE Trans. Information Theory, vol. 26, Jan. 1980.
[32] D.B. Skalak, "Prototype and Feature Selection by Sampling and Random Mutation Hill-Climbing Algorithms," Proc. 11th Int'l Conf. Machine Learning, pp. 293-301, 1994.
[33] D. Subramanian and M.R. Genesereth, "The Relevance of Irrelevance," Proc. IJCAI-87, pp. 416-422, 1987.
[34] J. Ullman, Principles of Database and Knowledge-Base Systems, vol. 1. Computer Science Press, 1988.
[35] C. Wallace and P. Freeman, "Estimation and Inference by Compact Coding," J. Royal Statistical Society (B), vol. 49, pp. 240-265, 1987.
[36] H. Wang, "Towards a Unified Framework of Relevance," Faculty of Informatics, Univ. of Ulster, 1996. rmml.html
[37] D.H. Wolpert, "The Relationship Between Occam's Razor and Convergent Guessing," Complex Systems, vol. 4, pp. 319-368, 1990.

Index Terms:
Machine learning, knowledge discovery, feature subset selection, relevance, entropy.
Hui Wang, David Bell, Fionn Murtagh, "Axiomatic Approach to Feature Subset Selection Based on Relevance," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 21, no. 3, pp. 271-277, March 1999, doi:10.1109/34.754624
Usage of this product signifies your acceptance of the Terms of Use.