This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Benchmarking Attribute Selection Techniques for Discrete Class Data Mining
November/December 2003 (vol. 15 no. 6)
pp. 1437-1447

Abstract—Data engineering is generally considered to be a central issue in the development of data mining applications. The success of many learning schemes, in their attempts to construct models of data, hinges on the reliable identification of a small set of highly predictive attributes. The inclusion of irrelevant, redundant, and noisy attributes in the model building process phase can result in poor predictive performance and increased computation. Attribute selection generally involves a combination of search and attribute utility estimation plus evaluation with respect to specific learning schemes. This leads to a large number of possible permutations and has led to a situation where very few benchmark studies have been conducted. This paper presents a benchmark comparison of several attribute selection methods for supervised classification. All the methods produce an attribute ranking, a useful devise for isolating the individual merit of an attribute. Attribute selection is achieved by cross-validating the attribute rankings with respect to a classification learner to find the best attributes. Results are reported for a selection of standard data sets and two diverse learning schemes C4.5 and naive Bayes.

[1] A. Blum and P. Langley, Selection of Relevant Features and Examples in Machine Learning Artificial Intelligence, vol. 97, nos. 1-2, pp. 245-271, 1997.
[2] M. Dash and H. Liu, Feature Selection for Classification Intelligent Data Analysis, vol. 1, no. 3, 1997.
[3] R. Kohavi and G.H. John, Wrappers for Feature Subset Selection Artificial Intelligence, vol. 97, pp. 273-324, 1997.
[4] U.M. Fayyad and K.B. Irani, Multiinterval Discretisation of Continuous-Valued Attributes Proc. 13th Int'l Joint Conf. Artificial Intelligence, pp. 1022-1027, 1993.
[5] J.R. Quinlan, C4.5: Programs for Machine Learning,San Mateo, Calif.: Morgan Kaufman, 1992.
[6] S. Dumais et al., "Inductive Learning Algorithms and Representations for Text Categorization, to be published in Proc. Conf. Information and Knowledge Management, 1998; .
[7] Y. Yang, "A Comparative Study on Feature Selection in Text Categorization," Proc. Int'l Machine Learning Conf., Morgan Kaufmann, San Francisco, 1997.
[8] K. Kira and L. Rendell, A Practical Approach to Feature Selection Proc. Ninth Int'l Conf. Machine Learning, pp. 249-256, 1992.
[9] I. Kononenko, "Estimating Attributes: Analysis and Extensions of RELIEF," Proc. 1994 European Conf. Machine Learning, 1994.
[10] M. Sikonja and I. Kononenko, An Adaptation of Relief for Attribute Estimation in Regression Proc. 14th Int'l Conf. (ICML '97), pp. 296-304, 1997.
[11] M.A. Hall, Correlation-Based Feature Selection for Machine Learning PhD thesis, Dept. of Computer Science, Univ. of Waikato, Hamilton, New Zealand, 1998.
[12] M. Hall, Correlation-Based Feature Selection for Discrete and Numeric Class Machine Learning Proc. 17th Int'l Conf. Machine Learning (ICML2000), 2000.
[13] H. Almuallim and T.G. Dietterich, Learning with Many Irrelevant Features Proc. Ninth Nat'l Conf. Artificial Intelligence, pp. 547-552, 1991.
[14] H. Liu and R. Setiono, A Probabilistic Approach to Feature Selection: A Filter Solution Proc. 13th Int'l Conf. Machine Learning, pp. 319-327, 1996.
[15] C. Blake, E. Keogh, and C.J. Merz, UCI Repository of Machine Learning Data Bases, Univ. of California, Dept. of Information and Computer Science, Irvine, CA, 1998. Also available athttp://research.microsoft.com/~sdumais/cikm98.dochttp:/ /www.ics.uci.edu/mlearnMLRepository.html .
[16] P. Langley, W. Iba, and K. Thompson, An Analysis of Bayesian Classifiers Proc. 10th Nat'l Conf. Artificial Intelligence, pp. 223-228, 1992. Also available at Langley92.ps.gz, fromhttp://www. isle.org/langley/papersbayes.aaai92.ps .

Index Terms:
Attribute selection, classification, benchmarking.
Citation:
Mark A. Hall, Geoffrey Holmes, "Benchmarking Attribute Selection Techniques for Discrete Class Data Mining," IEEE Transactions on Knowledge and Data Engineering, vol. 15, no. 6, pp. 1437-1447, Nov.-Dec. 2003, doi:10.1109/TKDE.2003.1245283
Usage of this product signifies your acceptance of the Terms of Use.