This Article 
 Bibliographic References 
 Add to: 
Unsupervised Feature Selection Using Feature Similarity
March 2002 (vol. 24 no. 3)
pp. 301-312

In this article, we describe an unsupervised feature selection algorithm suitable for data sets, large in both dimension and size. The method is based on measuring similarity between features whereby redundancy therein is removed. This does not need any search and, therefore, is fast. A new feature similarity measure, called maximum information compression index, is introduced. The algorithm is generic in nature and has the capability of multiscale representation of data sets. The superiority of the algorithm, in terms of speed and performance, is established extensively over various real-life data sets of different sizes and dimensions. It is also demonstrated how redundancy and information loss in feature selection can be quantified with an entropy measure.

[1] U. Fayyad and R. Uthurusamy, "Special Section: Data Mining and Knowledge Discovery in Databases: Introduction," Comm. ACM, Vol. 39, No. 11, Nov. 1996, pp. 24-26.
[2] P.A. Devijver and J. Kittler, Pattern Recognition: A Statistical Approach. Englewood Cliffs: Prentice Hall, 1982.
[3] P. Pudil, J. Novovicová, and J. Kittler, "Floating search methods in feature selection," Pattern Recognition Letters, vol. 15, pp. 1,119-1,125, 1994.
[4] DW. Aha and R.L. Bankert, “A Comparative Evaluation of Sequential Feature Selection Algorithms,” Artificial Intelligence and Statistics V, D. Fisher and J.-H. Lenz, eds., New York: Springer Verlag, 1996.
[5] Genetic Algorithms for Pattern Recognition, S.K. Pal and, P.P. Wang, eds. Boca Raton: CRC Press, 1996.
[6] M. Kudo and J. Sklansky, “Comparison of Algorithms that Selects Features for Pattern Classifiers,” Pattern Recognition, vol. 33, pp. 25-41, 2000.
[7] D. Skalak, “Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithms,” Proc. 11th Int'l. Machine Learning Conf., pp. 293-301, 1994.
[8] A.W. Moore and M.S. Lee, “Efficient Algorithms for Minimizing Cross Validation Error,” Proc. 11th Int'l. Conf. Machine Learning, 1994.
[9] H. Liu and R. Setiono, “Some Issues in Scalable Feature Selection,” Expert Systems with Applications, vol. 15, pp. 333-339, 1998.
[10] M. Dash and H. Liu, “Unsupervised Feature Selection,” Proc. Pacific Asia Conf. Knowledge Discovery and Data Mining, pp. 110-121, 2000.
[11] J. Dy and C. Brodley, “Feature Subset Selection and Order Identification for Unsupervised Learning,” Proc. 17th Int'l. Conf. Machine Learning, 2000.
[12] S. Basu, C.A. Micchelli, and P. Olsen, “Maximum Entropy and Maximum Likelihood Criteria for Feature Selection from Multivariate Data,” Proc. IEEE Int'l. Symp. Circuits and Systems, pp. III-267-270, 2000.
[13] S.K. Pal, R.K. De, and J. Basak, “Unsupervised Feature Evaluation: A Neuro-Fuzzy Approach,” IEEE Trans. Neural Network, vol. 11, pp. 366-376, 2000.
[14] M. Hall, Correlation-Based Feature Selection for Discrete and Numeric Class Machine Learning Proc. 17th Int'l Conf. Machine Learning (ICML2000), 2000.
[15] R.P. Heydorn, “Redundancy in Feature Extraction,” IEEE Trans. Computers, pp. 1051-1054, 1971.
[16] S.K. Das, “Feature Selection with a Linear Dependence Measure,” IEEE Trans. Computers, pp. 1106-1109, 1971.
[17] G.T. Toussaint and T.R. Vilmansen, “Comments on Feature Selection with a Linear Dependence Measure,” IEEE Trans. Computers, p. 408, 1972.
[18] K. Kira and L. Rendell, “A Practical Approach to Feature Selection,” Proc. Ninth Int'l. Workshop Machine Learning, pp. 249-256, 1992.
[19] I. Kononenko, "Estimating Attributes: Analysis and Extensions of RELIEF," Proc. 1994 European Conf. Machine Learning, 1994.
[20] D. Koller and M. Sahami, “Towards Optimal Feature Selection,” Proc. 13th Int'l. Conf. Machine Learning, pp. 284-292, 1996.
[21] B. King, “Step-Wise Clustering Procedures,” J. Am. Statistical Assoc., pp. 86-101, 1967.
[22] C.R. Rao, Linear Statistical Inference and Its Applications. John Wiley, 1973.
[23] C.L. Blake and C.J. Merz, UCI Repository of Machine Learning Databases, Univ. of California, Irvine, Dept. of Information and Computer Sciences, , 1998.
[24] E.L. Lehmann, Testing of Statistical Hypotheses. New York: John Wiley, 1976.
[25] A. Aspin, “Tables for Use in Comparisons Whose Accuracy Involves Two Variances,” Biometrika, vol. 36, pp. 245-271, 1949.

Index Terms:
data mining, pattern recognition, dimensionality reduction, feature clustering, multiscale representation, entropy measures
P. Mitra, C.A. Murthy, S.K. Pal, "Unsupervised Feature Selection Using Feature Similarity," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 3, pp. 301-312, March 2002, doi:10.1109/34.990133
Usage of this product signifies your acceptance of the Terms of Use.