This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Beyond Redundancies: A Metric-Invariant Method for Unsupervised Feature Selection
March 2010 (vol. 22 no. 3)
pp. 348-364
Yuexian Hou, Tianjin University, Tianjin and The Hong Kong Polytechnic University, Hong Kong
Peng Zhang, The Robert Gordon University, Aberdeen
Tingxu Yan, Tianjin University, Tianjin
Wenjie Li, The Hong Kong Polytechnic University, Hong Kong
Dawei Song, The Robert Gordon University, Aberdeen
A fundamental goal of unsupervised feature selection is denoising, which aims to identify and reduce noisy features that are not discriminative. Due to the lack of information about real classes, denoising is a challenging task. The noisy features can disturb the reasonable distance metric and result in unreasonable feature spaces, i.e., the feature spaces in which common clustering algorithms cannot effectively find real classes. To overcome the problem, we make a primary observation that the relevance of features is intrinsic and independent of any metric scaling on the feature space. This observation implies that feature selection should be invariant, at least to some extent, with respect to metric scaling. In this paper, we clarify the necessity of considering the metric invariance in unsupervised feature selection and propose a novel model incorporating metric invariance. Our proposed method is motivated by the following observations: if the statistic that guides the unsupervised feature selection process is invariant with respect to possible metric scaling, the solution of this model will also be invariant. Hence, if a metric-invariant model can distinguish discriminative features from noisy ones in a reasonable feature space, it will also work on the unreasonable counterpart transformed from the reasonable one by metric scaling. A theoretical justification of the metric invariance of our proposed model is given and the empirical evaluation demonstrates its promising performance.

[1] M.H. Law, M.A.T. Figueiredo, and A.K. Jain, “Simultaneous Feature Selection and Clustering Using Mixture Models,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 9, pp.1154-1166, Sept. 2004.
[2] K. Kohavi and G. John, “Wrappers for Feature Subset Selection,” Artificial Intelligence, vol. 97, pp. 273-324, 1997.
[3] H. Liu and L. Yu, “Toward Integrating Feature Selection Algorithms for Classification and Clustering,” IEEE Trans. Knowledge and Data Eng., vol. 17, no. 4, pp. 491-502, Apr. 2005.
[4] J. Wilbur and K. Sirotkin, “The Automatic Identification of Stop Words,” J. Information Science, vol. 18, pp. 45-55, 1992.
[5] M. Dash and H. Liu, “Feature Selection for Classification,” Int'l J. Intelligent Data Analysis, vol. 1, pp. 131-156, 1997.
[6] T. Liu, S. Liu, Z. Chen, and W. Ma, “An Evaluation on Feature Selection for Text Clustering,” Proc. Int'l Conf. Machine Learning, pp. 488-495, 2003.
[7] P. Mitra, A.C. Murthy, and K.S. Pal, “Unsupervised Feature Selection Using Feature Similarity,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 3, pp. 301-312, Mar. 2002.
[8] X. Yin and J. Han, “Cross-Relational Clustering with Users Guidance,” Proc. Sixth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 344-353, 2005.
[9] M. Campedel, I. Kyrgyzov, and H. Maitre, “Consensual Clustering for Unsupervised Feature Selection. Application to SPOT5 Satellite Images,” J. Machine Learning Research: Workshop and Conf. Proc., vol. 4, pp. 48-59, 2008.
[10] X. He, D. Cai, and P. Niyogi, “Laplacian Score for Feature Selection,” Proc. Conf. Advances in Neural Information Systems, 2005.
[11] Z. Zhao and H. Liu, “Spectral Feature Selection for Supervised and Unsupervised Learning,” Proc. Int'l Conf. Machine Learning, pp. 1151-1157, 2007.
[12] Y. Kim, W. Street, and F. Menczer, “Feature Selection in Unsupervised Learning via Evolutionary Search,” Proc. Sixth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp.365-369, 2000.
[13] L. Yu and H. Liu, “Efficient Feature Selection via Analysis of Relevance and Redundancy” J. Machine Learning Research, vol. 5, pp. 1205-1224, 2004.
[14] B. Cao, D. Shen, J. Sun, Q. Yang, and Z. Chen, “Feature Selection in a Kernel Space,” Proc. Int'l Conf. Machine Learning, pp. 121-128, 2007.
[15] I. Guyon and A. Elisseeff, “An Introduction to Variable and Feature Selection,” J. Machine Learning Research, vol. 3, pp. 1157-1182, 2003.
[16] H. Peng, F. Long, and C. Ding, “Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 8, pp. 1226-1238, Aug. 2005.
[17] K. Kira and A.L. Rendell, “A Practical Approach to Feature Selection,” Proc. Int'l Conf. Machine Learning, pp. 249-256, 1992.
[18] J. Dy and C.E. Brodley, “Feature Selection for Unsupervised Learning,” J. Machine Learning Research, vol. 5, pp. 845-889, 2004.
[19] I. Jolliffe, Principal Component Analysis. Springer-Verlag, 1989.
[20] P. Giudici and E. Stanghellini, “Bayesian Inference for Graphical Factor Analysis Models,” Psychometrika, vol. 66, no. 4, pp. 577-591, 2001.
[21] L. Yang and R. Jin, “Distance Metric Learning: A Comprehensive Survey,” technical report, Michigan State Univ., 2006.
[22] Y. Hou, P. Zhang, X. Xu, X. Zhang, and W. Li, “Nonlinear Dimensionality Reduction by Locally Linear Inlaying,” IEEE Trans. Neural Networks, vol. 20, no. 2, pp. 300-315, Feb. 2009.
[23] A.P. Estevez, M. Tesmer, A.C. Perez, and M.J. Zurada, “Normalized Mutual Information Feature Selection,” IEEE Trans. Neural Networks, vol. 20, no. 2, pp. 189-201, Feb. 2009.
[24] L. Wang, N. Zhou, and F. Chu, “A General Wrapper Approach to Selection of Class-Dependent Features,” IEEE Trans. Neural Networks, vol. 19, no. 7, pp. 1267-1278, July 2008.
[25] C. Su and Y. Hsiao, “Multiclass MTS for Simultaneous Feature Selection and Classification,” IEEE Trans. Knowledge and Data Eng., vol. 21, no. 2, pp. 192-205, Feb. 2009.
[26] S. Roweis and L. Saul, “Nonlinear Dimensionality Reduction by Locally Linear Embedding,” Science, vol. 290, pp. 2323-2326, 2000.
[27] M. Belkin and P. Niyogi, “Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering,” Proc. Conf. Advances in Neural Information Systems, pp. 585-591, 2002.
[28] Z. Zhang and H. Zha, “Principal Manifolds and Nonlinear Dimensionality Reduction via Tangent Space Alignment,” SIAM J. Scientific Computing, vol. 26, no. 1, pp. 313-338, 2004.
[29] S. Gerard and C. Buckley, “Term-Weighting Approaches in Automatic Text Retrieval,” Information Processing and Management, vol. 24, no. 5, pp. 513-523, 1988.
[30] Y. Yang and X. Liu, “A Re-Examination of Text Categorization Methods,” Proc. SIGIR Conf., pp. 42-49, 1999.
[31] J. Kybic, “High-Dimensional Entropy Estimation for Finite Accuracy Data: R-nn Entropy Estimator,” Lecture Notes in Computer Science (LNCS), pp. 569-580, Springer, 2007.
[32] G. Darbellay and I. Vajda, “Estimation of the Information by an Adaptive Partitioning of the Observation Space,” IEEE Trans. Information Theory, vol. 45, no. 4, pp. 1315-1321, May 1999.
[33] K. Fukunaga, Introduction to Statistical Pattern Recognition, second ed. Academic Press, 1990.
[34] S. Mika, G. Ratsch, J. Weston, B. Scholkopf, and K. Muller, “Fisher Discriminant Analysis with Kernels,” Proc. IEEE Conf. Neural Networks for Signal Processing IX, pp. 41-48, 1999.
[35] Y. Sun and J. Li, “Iterative Relief for Feature Weighting,” Proc. Int'l Conf. Machine Learning, pp. 913-920, 2006.
[36] Y. Yang and O.J. Pedersen, “A Comparative Study on Feature Selection in Text Categorization,” Proc. Int'l Conf. Machine Learning, pp. 412-420, 1997.
[37] A. Hyvainen and E. Oja, “Independent Component Analysis: Algorithms Applications,” Neural Networks, vol. 13, nos. 4/5, pp.411-430, 2000.
[38] P. Comon, “Independent Component Analysis—a New Concept?” Signal Processing, vol. 36, pp. 287-314, 1994.
[39] A. Hyvainen, “Survey on Independent Component Analysis,” Neural Computing Surveys, vol. 2, pp. 94-128, 1999.
[40] N. Karmarkar, “A New Polynomial Time Algorithm for Linear Programming,” Combinatorica, vol. 4, no. 4, pp. 373-395, 1984.

Index Terms:
Feature evaluation and selection, information theory, metric invariant.
Citation:
Yuexian Hou, Peng Zhang, Tingxu Yan, Wenjie Li, Dawei Song, "Beyond Redundancies: A Metric-Invariant Method for Unsupervised Feature Selection," IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 3, pp. 348-364, March 2010, doi:10.1109/TKDE.2009.84
Usage of this product signifies your acceptance of the Terms of Use.