This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Toward Integrating Feature Selection Algorithms for Classification and Clustering
April 2005 (vol. 17 no. 4)
pp. 491-502
Huan Liu, IEEE
Lei Yu, IEEE
This paper introduces concepts and algorithms of feature selection, surveys existing feature selection algorithms for classification and clustering, groups and compares different algorithms with a categorizing framework based on search strategies, evaluation criteria, and data mining tasks, reveals unattempted combinations, and provides guidelines in selecting feature selection algorithms. With the categorizing framework, we continue our efforts toward building an integrated system for intelligent feature selection. A unifying platform is proposed as an intermediate step. An illustrative example is presented to show how existing feature selection algorithms can be integrated into a meta algorithm that can take advantage of individual algorithms. An added advantage of doing so is to help a user employ a suitable algorithm without knowing details of each algorithm. Some real-world applications are included to demonstrate the use of feature selection in data mining. We conclude this work by identifying trends and challenges of feature selection research and development.

[1] R. Agrawal, T. Imielinski, and A. Swami, “Database Mining: A Performance Perspective,” IEEE Trans. Knowledge and Data Eng., vol. 5, no. 6, pp. 914-925, 1993.
[2] H. Almuallim and T.G. Dietterich, “Learning with Many Irrelevant Features,” Proc. Ninth Nat'l Conf. Artificial Intelligence, pp. 547-552, 1991.
[3] H. Almuallim and T.G. Dietterich, “Learning Boolean Concepts in the Presence of Many Irrelevant Features,” Artificial Intelligence, vol. 69, nos. 1-2, pp. 279-305, 1994.
[4] C. Apte, B. Liu, P.D. Pendault, and P. Smyth, “Business Applications of Data Mining,” Comm. ACM, vol. 45, no. 8, pp. 49-53, 2002.
[5] M. Ben-Bassat, “Pattern Recognition and Reduction of Dimensionality,” Handbook of Statistics-II, P.R. Krishnaiah and L.N. Kanal, eds., pp. 773-791, North Holland, 1982.
[6] A.L. Blum and P. Langley, “Selection of Relevant Features and Examples in Machine Learning,” Artificial Intelligence, vol. 97, pp. 245-271, 1997.
[7] A.L. Blum and R.L. Rivest, “Training a 3-Node Neural Networks is NP-Complete,” Neural Networks, vol. 5, pp. 117-127, 1992.
[8] L. Bobrowski, “Feature Selection Based on Some Homogeneity Coefficient,” Proc. Ninth Int'l Conf. Pattern Recognition, pp. 544-546, 1988.
[9] P Bradley, J. Gehrke, R. Ramakrishna, and R. Ssrikant, “Scaling Mining Algorithms to Large Datbases,” Comm. ACM, vol. 45, no. 8, pp. 38-43, 2002.
[10] G. Brassard and P. Bratley, Fundamentals of Algorithms. New Jersey: Prentice Hall, 1996.
[11] H. Brighton and C. Mellish, “Advances in Instance Selection for Instance-Based Learning Algorithms,” Data Mining and Knowledge Discovery, vol. 6, no. 2, pp. 153-172, 2002.
[12] C. Cardie, “Using Decision Trees to Improve Case-Based Learning,” Proc. 10th Int'l Conf. Machine Learning, P. Utgoff, ed., pp. 25-32, 1993.
[13] R. Caruana and D. Freitag, “Greedy Attribute Selection,” Proc. 11th Int'l Conf. Machine Learning, pp. 28-36, 1994.
[14] W.G. Cochran, Sampling Techniques. John Wiley & Sons, 1977.
[15] S. Das, “Filters, Wrappers and a Boosting-Based Hybrid for Feature Selection,” Proc. 18th Int'l Conf. Machine Learning, pp. 74-81, 2001.
[16] M. Dash, “Feature Selection via Set Cover,” Proc. IEEE Knowledge and Data Eng. Exchange Workshop, pp. 165-171, 1997.
[17] M. Dash, K. Choi, P. Scheuermann, and H. Liu, “Feature Selection for Clustering-a Filter Solution,” Proc. Second Int'l Conf. Data Mining, pp. 115-122, 2002.
[18] M. Dash and H. Liu, “Feature Selection for Classification,” Intelligent Data Analysis: An Int'l J., vol. 1, no. 3, pp. 131-156, 1997.
[19] M. Dash and H. Liu, “Handling Large Unsupervised Data via Dimensionality Reduction,” Proc. 1999 SIGMOD Research Issues in Data Mining and Knowledge Discovery (DMKD-99) Workshop, 1999.
[20] M. Dash and H. Liu, “Feature Selection for Clustering,” Proc. Fourth Pacific Asia Conf. Knowledge Discovery and Data Mining, (PAKDD-2000), pp. 110-121, 2000.
[21] M. Dash, H. Liu, and H. Motoda, “Consistency Based Feature Selection,” Proc. Fourth Pacific Asia Conf. Knowledge Discovery and Data Mining, (PAKDD-2000), pp. 98-109, 2000.
[22] M. Dash, H. Liu, and J. Yao, “Dimensionality Reduction of Unsupervised Data,” Proc. Ninth IEEE Int'l Conf. Tools with AI (ICTAI '97), pp. 532-539, 1997.
[23] M. Devaney and A. Ram, “Efficient Feature Selection in Conceptual Clustering,” Proc. 14th Int'l Conf. Machine Learning, pp. 92-97, 1997.
[24] P.A. Devijver and J. Kittler, Pattern Recognition: A Statistical Approach. Prentice Hall Int'l, 1982.
[25] J. Doak, “An Evaluation of Feature Selection Methods and Their Application to Computer Security,” technical report, Univ. of California at Davis, Dept. Computer Science, 1992.
[26] P. Domingos, “Context Sensitive Feature Selection for Lazy Learners,” AI Rev., vol. 14, pp. 227-253, 1997.
[27] J.G. Dy and C.E. Brodley, “Feature Subset Selection and Order Identification for Unsupervised Learning,” Proc. 17th Int'l Conf. Machine Learning, pp. 247-254, 2000.
[28] U.M. Fayyad and K.B. Irani, “Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning,” Proc. 13th Int'l Joint Conf. Artificial Intelligence, pp. 1022-1027, 1993.
[29] U.M. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, “From Data Mining to Knowledge Discovery: An Overview,” Advances in Knowledge Discovery and Data Mining, U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, eds., pp. 495-515, AAAI Press/The MIT Press, 1996.
[30] U.M. Fayyad and R. Uthurusamy, “Evolving Data Mining into Solutions for Insights,” Comm. ACM, vol. 45, no. 8, pp. 28-31, 2002.
[31] I. Foroutan and J. Sklansky, “Feature Selection for Automatic Classification of Non-Gaussian Data,” Trans. Systems, Man, and Cybernatics, vol. 17, no. 2, pp. 187-198, 1987.
[32] J.H. Friedman and J.J. Meulman, “Clustering Objects on Subsets of Attributes,” http://citeseer.ist.psu.edufriedman02clustering. html , 2002.
[33] B. Gu, F. Hu, and H. Liu, “Sampling: Knowing Whole from Its Part,” Instance Selection and Construction for Data Mining, pp. 21-38, 2001.
[34] M.A. Hall, “Correlation-Based Feature Selection for Discrete and Numeric Class Machine Learning,” Proc. 17th Int'l Conf. Machine Learning, pp. 359-366, 2000.
[35] J. Han and Y. Fu, “Attribute-Oriented Induction in Data Mining,” Advances in Knowledge Discovery and Data Mining, U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, eds., pp. 399-421, AAAI Press/The MIT Press, 1996.
[36] J. Han and M. Kamber, Data Mining: Concepts and Techniques. Morgan Kaufman, 2001.
[37] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning. Springer, 2001.
[38] M. Ichino and J. Sklansky, “Feature Selection for Linear Classifier,” Proc. Seventh Int'l Conf. Pattern Recognition, pp. 124-127, 1984.
[39] M. Ichino and J. Sklansky, “Optimum Feature Selection by Zero-One Programming,” IEEE Trans. Systems, Man, and Cybernetics, vol. 14, no. 5, pp. 737-746, 1984.
[40] A. Jain and D. Zongker, “Feature Selection: Evaluation, Application, and Small Sample Performance,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 2, 153-158, Feb. 1997.
[41] G.H. John, R. Kohavi, and K. Pfleger, “Irrelevant Feature and the Subset Selection Problem,” Proc. 11th Int'l Conf. Machine Learning, pp. 121-129, 1994.
[42] Y. Kim, W. Street, and F. Menczer, “Feature Selection for Unsupervised Learning via Evolutionary Search,” Proc. Sixth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 365-369, 2000.
[43] K. Kira and L.A. Rendell, “The Feature Selection Problem: Traditional Methods and a New Algorithm,” Proc. 10th Nat'l Conf. Artificial Intelligence, pp. 129-134, 1992.
[44] R. Kohavi and G.H. John, “Wrappers for Feature Subset Selection,” Artificial Intelligence, vol. 97, nos. 1-2, pp. 273-324, 1997.
[45] R. Kohavi, N.J. Rothleder, and E. Simoudis, “Emerging Trends in Business Analytics,” Comm. ACM, vol. 45, no. 8, pp. 45-48, 2002.
[46] D. Koller and M. Sahami, “Toward Optimal Feature Selection,” Proc. 13th Int'l Conf. Machine Learning, pp. 284-292, 1996.
[47] I. Kononenko, “Estimating Attributes: Analysis and Extension of RELIEF,” Proc. Sixth European Conf. Machine Learning, pp. 171-182, 1994.
[48] P. Langley, “Selection of Relevant Features in Machine Learning,” Proc. AAAI Fall Symp. Relevance, pp. 140-144, 1994.
[49] W. Lee, S.J. Stolfo, and K.W. Mok, “Adaptive Intrusion Detection: A Data Mining Approach,” AI Rev., vol. 14, no. 6, pp. 533-567, 2000.
[50] E. Leopold and J. Kindermann, “Text Categorization with Support Vector Machines. How to Represent Texts in Input Space?” Machine Learning, vol. 46, pp. 423-444, 2002.
[51] H. Liu, F. Hussain, C.L. Tan, and M. Dash, “Discretization: An Enabling Technique,” Data Mining and Knowledge Discovery, vol. 6, no. 4, pp. 393-423, 2002.
[52] Feature Extraction, Construction and Selection: A Data Mining Perspective, H. Liu and H. Motoda, eds. Boston: Kluwer Academic, 1998, second printing, 2001.
[53] H. Liu and H. Motoda, Feature Selection for Knowledge Discovery and Data Mining. Boston: Kluwer Academic, 1998.
[54] H. Liu and H. Motoda, “Less Is More,” Feature Extraction, Construction and Selection: A Data Mining Perspective, pp. 3-12, chapter 1, 1998, second printing, 2001.
[55] Instance Selection and Construction for Data Mining, H. Liu and H. Motoda, eds. Boston: Kluwer Academic Publishers, 2001.
[56] H. Liu, H. Motoda, and M. Dash, “A Monotonic Measure for Optmial Feature Selection,” Proc. 10th European Conf. Machine Learning, pp. 101-106, 1998.
[57] H. Liu, H. Motoda, and L. Yu, “Feature Selection with Selective Sampling,” Proc. 19th Int'l Conf. Machine Learning, pp. 395-402, 2002.
[58] H. Liu and R. Setiono, “Feature Selection and Classification-A Probabilistic Wrapper Approach,” Proc. Ninth Int'l Conf. Industrial and Eng. Applications of AI and ES, T. Tanaka, S. Ohsuga, and M. Ali, eds., pp. 419-424, 1996.
[59] H. Liu and R. Setiono, “A Probabilistic Approach to Feature Selection-A Filter Solution,” Proc. 13th Int'l Conf. Machine Learning, pp. 319-327, 1996.
[60] H. Liu, L. Yu, M. Dash, and H. Motoda, “Active Feature Selection Using Classes,” Proc. Seventh Pacific-Asia Conf. Knowledge Discovery and Data Mining, pp. 474-485, 2003.
[61] D. Madigan, N. Raghavan, W. DuMouchel, C. Nason, M. Posse, and G. Ridgeway, “Likelihood-Based Data Squashing: A Modeling Approach to Instance Construction,” Data Mining and Knowledge Discovery, vol. 6, no. 2, pp. 173-190, 2002.
[62] A. Miller, Subset Selection in Regression, second ed. Chapman & Hall/CRC, 2002.
[63] P. Mitra, C.A. Murthy, and S.K. Pal, “Unsupervised Feature Selection Using Feature Similarity,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 3, pp. 301-312, Mar. 2002.
[64] M. Modrzejewski, Feature Selection Using Rough Sets Theory,” Proc. European Conf. Machine Learning, P.B. Brazdil, ed., pp. 213-226, 1993.
[65] A.W. Moore and M.S. Lee, “Efficient Algorithms for Minimizing Cross Validation Error,” Proc. 11th Int'l Conf. Machine Learning, pp. 190-198, 1994.
[66] A.N. Mucciardi and E.E. Gose, “A Comparison of Seven Techniques for Choosing Subsets of Pattern Recognition,” IEEE Trans. Computers, vol. 20, pp. 1023-1031, 1971.
[67] P.M. Narendra and K. Fukunaga, “A Branch and Bound Algorithm for Feature Subset Selection,” IEEE Trans. Computer, vol. 26, no. 9, pp. 917-922, Sept. 1977.
[68] A.Y. Ng, “On Feature Selection: Learning with Exponentially Many Irrelevant Features as Training Examples,” Proc. 15th Int'l Conf. Machine Learning, pp. 404-412, 1998.
[69] K.S. Ng and H. Liu, “Customer Retention via Data Mining,” AI Rev., vol. 14, no. 6, pp. 569-590, 2000.
[70] K. Nigam, A.K. Mccallum, S. Thrun, and T. Mitchell, “Text Classification from Labeled and Unlabeled Documents Using EM,” Machine Learning, vol. 39, 103-134, 2000.
[71] A.L. Oliveira and A.S. Vincentelli, “Constructive Induction Using a Non-Greedy Strategy for Feature Selection,” Proc. Ninth Int'l Conf. Machine Learning, pp. 355-360, 1992.
[72] L. Parsons, E. Haque, and H. Liu, “Subspace Clustering for High Dimensional Data: A Review,” SIGKDD Explorations, vol. 6, no. 1, pp. 90-105, 2004.
[73] P. Pudil and J. Novovicova, “Novel Methods for Subset Selection with Respect to Problem Knowledge,” Feature Extraction, Construction and Selection: A Data Mining Perspective, pp. 101-116, 1998, second printing, 2001.
[74] D. Pyle, Data Preparation for Data Mining. Morgan Kaufmann Publishers, 1999.
[75] C.E. Queiros and E.S. Gelsema, “On Feature Selection,” Proc. Seventh Int'l Conf. Pattern Recognition, pp. 128-130, 1984.
[76] T. Reinartz, “A Unifying View on Instance Selection,” Data Mining and Knowledge Discovery, vol. 6, no. 2, pp. 191-210, 2002.
[77] Y. Rui, T.S. Huang, and S. Chang, “Image Retrieval: Current Techniques, Promising Directions and Open Issues,” Visual Comm. and Image Representation, vol. 10, no. 4, pp. 39-62, 1999.
[78] J.C. Schlimmer, “Efficiently Inducing Determinations: A Complete and Systematic Search Algorithm that Uses Optimal Pruning,” Proc. 10th Int'l Conf. Machine Learning, pp. 284-290, 1993.
[79] J. Segen, “Feature Selection and Constructive Inference,” Proc. Seventh Int'l Conf. Pattern Recognition, pp. 1344-1346, 1984.
[80] J. Sheinvald, B. Dom, and W. Niblack, “A Modelling Approach to Feature Selection,” Proc. 10th Int'l Conf. Pattern Recognition, pp. 535-539, 1990.
[81] W. Siedlecki and J. Sklansky, “On Automatic Feature Selection,” Int'l J. Pattern Recognition and Artificial Intelligence, vol. 2, pp. 197-220, 1988.
[82] D.B. Skalak, “Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithms,” Proc. 11th Int'l Conf. Machine Learning, pp. 293-301, 1994.
[83] N. Slonim, G. Bejerano, S. Fine, and N. Tishbym, “Discriminative Feature Selection via Multiclass Variable Memory Markov Model,” Proc. 19th Int'l Conf. Machine Learning, pp. 578-585, 2002.
[84] P. Smyth, D. Pregibon, and C. Faloutsos, “Data-Driven Evolution of Data Mining Algorithms,” Comm. ACM, vol. 45, no. 8, pp. 33-37, 2002.
[85] D.J. Stracuzzi and P.E. Utgoff, “Randomized Variable Elimination,” Proc. 19th Int'l Conf. Machine Learning, pp. 594-601, 2002.
[86] D.L. Swets and J.J. Weng, “Efficient Content-Based Image Retrieval Using Automatic Feature Selection,” IEEE Int'l Symp. Computer Vision, pp. 85-90, 1995.
[87] L. Talavera, “Feature Selection as a Preprocessing Step for Hierarchical Clustering,” Proc. Int'l Conf. Machine Learning (ICML '99), pp. 389-397, 1999.
[88] H. Vafaie and I.F. Imam, “Feature Selection Methods: Genetic Algorithms vs. Greedy-Like Search,” Proc. Int'l Conf. Fuzzy and Intelligent Control Systems, 1994.
[89] I.H. Witten and E. Frank, Data Mining-Pracitcal Machine Learning Tools and Techniques with JAVA Implementations. Morgan Kaufmann, 2000.
[90] N. Wyse, R. Dubes, and A.K. Jain, “A Critical Evaluation of Intrinsic Dimensionality Algorithms,” Pattern Recognition in Practice, E.S. Gelsema and L.N. Kanal, eds., pp. 415-425, Morgan Kaufmann, Inc., 1980.
[91] E. Xing, M. Jordan, and R. Karp, “Feature Selection for High-Dimensional Genomic Microarray Data,” Proc. 15th Int'l Conf. Machine Learning, pp. 601-608, 2001.
[92] L. Xu, P. Yan, and T. Chang, “Best First Strategy for Feature Selection,” Proc. Ninth Int'l Conf. Pattern Recognition, pp. 706-708, 1988.
[93] J. Yang and V. Honavar, “Feature Subset Selection Using A Genetic Algorithm,” Feature Extraction, Construction and Selection: A Data Mining Perspective, pp. 117-136, 1998, second printing, 2001.
[94] Y. Yang and J.O. Pederson, “A Comparative Study on Feature Selection in Text Categorization,” Proc. 14th Int'l Conf. Machine Learning, pp. 412-420, 1997.
[95] L. Yu and H. Liu, “Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution,” Proc. 20th Int'l Conf. Machine Learning, pp. 856-863, 2003.
[96] L. Yu and H. Liu, “Redundancy Based Feature Selection for Microarray Data,” Proc. 10th ACM SIGKDD Conf. Knowledge Discovery and Data Mining, 2004.

Index Terms:
Feature selection, classification, clustering, categorizing framework, unifying platform, real-world applications.
Citation:
Huan Liu, Lei Yu, "Toward Integrating Feature Selection Algorithms for Classification and Clustering," IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 4, pp. 491-502, April 2005, doi:10.1109/TKDE.2005.66
Usage of this product signifies your acceptance of the Terms of Use.