This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy
August 2005 (vol. 27 no. 8)
pp. 1226-1238
Feature selection is an important problem for pattern classification systems. We study how to select good features according to the maximal statistical dependency criterion based on mutual information. Because of the difficulty in directly implementing the maximal dependency condition, we first derive an equivalent form, called minimal-redundancy-maximal-relevance criterion (mRMR), for first-order incremental feature selection. Then, we present a two-stage feature selection algorithm by combining mRMR and other more sophisticated feature selectors (e.g., wrappers). This allows us to select a compact set of superior features at very low cost. We perform extensive experimental comparison of our algorithm and other methods using three different classifiers (naive Bayes, support vector machine, and linear discriminate analysis) and four different data sets (handwritten digits, arrhythmia, NCI cancer cell lines, and lymphoma tissues). The results confirm that mRMR leads to promising improvement on feature selection and classification accuracy.

[1] A.A. Alizadeh et al., “Distinct Types of Diffuse Large B-Cell Lymphoma Identified by Gene Expression Profiling,” Nature, vol. 403, pp. 503-511, 2000.
[2] C.J.C. Burges, “A Tutorial on Support Vector Machines for Pattern Recognition,” Knowledge Discovery and Data Mining, vol. 2, pp. 1-43, 1998.
[3] T. Cover and J. Thomas, Elements of Information Theory. New York: Wiley, 1991.
[4] T.M. Cover, “The Best Two Independent Measurements Are Not the Two Best,” IEEE Trans. Systems, Man, and Cybernetics, vol. 4, pp. 116-117, 1974.
[5] C. Ding and H.C. Peng, “Minimum Redundancy Feature Selection from Microarray Gene Expression Data,” Proc. Second IEEE Computational Systems Bioinformatics Conf., pp. 523-528, Aug. 2003.
[6] R.P.W. Duin and D.M.J. Tax, “Experiments with Classifier Combining Rules,” Proc. First Int'l Workshop Multiple Classifier Systems MCS 2000, J. Kittler and F. Roli, eds., pp. 16-29, June 2000.
[7] S.W. Hadley, C. Pelizzari, and G.T.Y. Chen, “Registration of Localization Images by Maximization of Mutual Information,” Proc. Ann. Meeting of the Am. Assoc. Physicists in Medicine, 1996.
[8] E. Herskovits, H.C. Peng, and C. Davatzikos, “A Bayesian Morphometry Algorithm,” IEEE Trans. Medical Imaging, vol. 24, no. 6, pp. 723-737, 2004.
[9] C.W. Hsu and C.J. Lin, “A Comparison of Methods for Multi-Class Support Vector Machines,” IEEE Trans. Neural Networks, vol. 13, pp. 415-425, 2002.
[10] A. Hyvärinen, J. Karhunen, and E. Oja, Independent Component Analysis. John Wiley & Sons, 2001.
[11] F.J. Iannarilli and P.A. Rubin, “Feature Selection for Multiclass Discrimination via Mixed-Integer Linear Programming,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 25, no. 6, pp. 779-783, June 2003.
[12] J. Jaeger, R. Sengupta, and W.L. Ruzzo, “Improved Gene Selection for Classification of Microarrays,” Proc. Pacific Symp. Biocomputing, pp. 53-64, 2003.
[13] A.K. Jain and D. Zongker, “Feature Selection: Evaluation, Application, and Small Sample Performance,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 2, pp. 153-158, Feb. 1997.
[14] A.K. Jain, R.P.W. Duin, and J. Mao, “Statistical Pattern Recognition: A Review,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 1, pp. 4-37, Jan. 2000.
[15] R. Kohavi and G. John, “Wrapper for Feature Subset Selection,” Artificial Intelligence, vol. 97, nos. 1-2, pp. 273-324, 1997.
[16] S.G. Krantz, “Jensen's Inequality,” Handbook of Complex Analysis, subsection 9.1.3, p. 118, Boston: Birkhäuser, 1999.
[17] N. Kwak and C.H. Choi, “Input Feature Selection by Mutual Information Based on Parzen Window,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 12, pp. 1667-1671, Dec. 2002.
[18] P. Langley, “Selection of Relevant Features in Machine Learning,” Proc. AAAI Fall Symp. Relevance, 1994.
[19] W. Li and Y. Yang, “How Many Genes Are Needed for a Discriminant Microarray Data Analysis?” Proc. Critical Assessment of Techniques for Microarray Data Mining Workshop, pp. 137-150, Dec. 2000.
[20] T. Mitchell, Machine Learning. McGraw-Hill, 1997.
[21] E. Parzen, “On Estimation of a Probability Density Function and Mode,” Annals of Math. Statistics, vol. 33, pp. 1065-1076, 1962.
[22] H.C. Peng, Q. Gan, and Y. Wei, “Two Optimization Criterions for Neural Networks and Their Applications in Unconstrained Character Recognition,” J. Circuits and Systems, vol. 2, no. 3, pp. 1-6, 1997, (in Chinese).
[23] H.C. Peng, E.H. Herskovits, and C. Davatzikos, “Bayesian Clustering Methods for Morphological Analysis of MR Images,” Proc. Int'l Symp. Biomedical Imaging: from Nano to Macro, pp. 485-488, 2002.
[24] H.C. Peng and C. Ding, “Structural Search and Stability Enhancement of Bayesian Networks,” Proc. Third IEEE Int'l Conf. Data Mining, pp. 621-624, Nov. 2003.
[25] P. Pudil, J. Novovicova, and J. Kittler, “Floating Search Methods in Feature Selection,” Pattern Recognition Letters, vol. 15, no. 11, pp. 1119-1125, 1994.
[26] D.T. Ross et al., “Systematic Variation in Gene Expression Patterns in Human Cancer Cell Lines,” Nature Genetics, vol. 24, no. 3, pp. 227-234, 2000.
[27] U. Scherf et al., “A cDNA Microarray Gene Expression Database for the Molecular Pharmacology of Cancer,” Nature Genetics, vol. 24, no. 3, pp. 236-244, 2000.
[28] UCI Learning Repository, http://www.ics.uci.edu/mlearnMLSummary.html , 2005.
[29] V. Vapnik, The Nature of Statistical Learning Theory. New York: Springer, 1995.
[30] A. Webb, Statistical Pattern Recognition. Ar nold, 1999.
[31] E.P. Xing, M.I. Jordan, and R.M. Karp, “Feature Selection for High-Dimensional Genomic Microarray Data,” Proc. 18th Int'l Conf. Machine Learning, 2001.
[32] M. Xiong, Z. Fang, and J. Zhao, “Biomarker Identification by Feature Wrappers,” Genome Research, vol. 11, pp. 1878-1887, 2001.

Index Terms:
Index Terms- Feature selection, mutual information, minimal redundancy, maximal relevance, maximal dependency, classification.
Citation:
Hanchuan Peng, Fuhui Long, Chris Ding, "Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 8, pp. 1226-1238, Aug. 2005, doi:10.1109/TPAMI.2005.159
Usage of this product signifies your acceptance of the Terms of Use.