The Community for Technology Leaders
RSS Icon
Issue No.08 - August (2010 vol.32)
pp: 1517-1522
Jose M. Peña , Linköping University, Linköping
Roland Nilsson , Harvard Medical School, Boston
Consider a classification problem involving only discrete features that are represented as random variables with some prescribed discrete sample space. In this paper, we study the complexity of two feature selection problems. The first problem consists in finding a feature subset of a given size k that has minimal Bayes risk. We show that for any increasing ordering of the Bayes risks of the feature subsets (consistent with an obvious monotonicity constraint), there exists a probability distribution that exhibits that ordering. This implies that solving the first problem requires an exhaustive search over the feature subsets of size k. The second problem consists of finding the minimal feature subset that has minimal Bayes risk. In the light of the complexity of the first problem, one may think that solving the second problem requires an exhaustive search over all of the feature subsets. We show that, under mild assumptions, this is not true. We also study the practical implications of our solutions to the second problem.
Feature evaluation and selection, classifier design and evaluation, machine learning.
Jose M. Peña, Roland Nilsson, "On the Complexity of Discrete Feature Selection for Optimal Classification", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.32, no. 8, pp. 1517-1522, August 2010, doi:10.1109/TPAMI.2010.84
[1] T. Cover and J. Van Campenhout, "On the Possible Orderings in the Measurement Selection Problem," IEEE Trans. Systems, Man, and Cybernetics, vol. 7, no. 9, pp. 657-661, Sept. 1977.
[2] L. Devroye, L. Györfi, and G. Lugosi, A Probabilistic Theory of Pattern Recognition. Springer, 1996.
[3] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, "Gene Selection for Cancer Classification Using Support Vector Machines," Machine Learning, vol. 46, pp. 389-422, 2002.
[4] R. Kohavi and G.H. John, "Wrappers for Feature Subset Selection," Artificial Intelligence, vol. 97, pp. 273-324, 1997.
[5] R. Nilsson, J.M. Peña, J. Björkegren, and J. Tegnér, "Consistent Feature Selection for Pattern Recognition in Polynomial Time," J. Machine Learning Research, vol. 8, pp. 589-612, 2007.
[6] J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, 1988.
[7] J.M. Peña, R. Nilsson, J. Björkegren, and J. Tegnér, "Towards Scalable and Data Efficient Learning of Markov Boundaries," Int'l J. Approximate Reasoning, vol. 45, pp. 211-232, 2007.
[8] I. Steinwart, "On the Influence of the Kernel on the Consistency of Support Vector Machines," J. Machine Learning Research, vol. 2, pp. 67-93, 2001.
[9] I. Tsamardinos and C. Aliferis, "Towards Principled Feature Selection: Relevancy, Filters and Wrappers," Proc. Ninth Int'l Workshop Artificial Intelligence and Statistics, 2003.
[10] I. Tsamardinos, C.F. Aliferis, and A. Statnikov, "Algorithms for Large Scale Markov Blanket Discovery," Proc. 16th Int'l Florida Artificial Intelligence Research Soc. Conf., pp. 376-380, 2003.
[11] J. Van Campenhout, "The Arbitrary Relation between Probability of Error and Measurement Subset," J. Am. Statistical Assoc., vol. 75, pp. 104-109, 1980.
97 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool