The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.03 - March (2013 vol.25)
pp: 619-632
Zheng Zhao , SAS Institute Inc., Cary
Lei Wang , University of Wollongong, NSW, Australia
Huan Liu , Arizona State University, Tempe
Jieping Ye , Arizona State University, Tempe
ABSTRACT
In the literature of feature selection, different criteria have been proposed to evaluate the goodness of features. In our investigation, we notice that a number of existing selection criteria implicitly select features that preserve sample similarity, and can be unified under a common framework. We further point out that any feature selection criteria covered by this framework cannot handle redundant features, a common drawback of these criteria. Motivated by these observations, we propose a new "Similarity Preserving Feature Selection” framework in an explicit and rigorous way. We show, through theoretical analysis, that the proposed framework not only encompasses many widely used feature selection criteria, but also naturally overcomes their common weakness in handling feature redundancy. In developing this new framework, we begin with a conventional combinatorial optimization formulation for similarity preserving feature selection, then extend it with a sparse multiple-output regression formulation to improve its efficiency and effectiveness. A set of three algorithms are devised to efficiently solve the proposed formulations, each of which has its own advantages in terms of computational complexity and selection performance. As exhibited by our extensive experimental study, the proposed framework achieves superior feature selection performance and attractive properties.
INDEX TERMS
Laplace equations, Algorithm design and analysis, Optimization, Prediction algorithms, Redundancy, Feature extraction, sparse regularization, Feature selection, similarity preserving, redundancy removal, multiple output regression
CITATION
Zheng Zhao, Lei Wang, Huan Liu, Jieping Ye, "On Similarity Preserving Feature Selection", IEEE Transactions on Knowledge & Data Engineering, vol.25, no. 3, pp. 619-632, March 2013, doi:10.1109/TKDE.2011.222
REFERENCES
[1] D.W. Aha, "Feature Weighting for Lazy Learning Algorithms," Feature Extraction, Construction and Selection: a Data Mining Perspective, pp. 13-32, Springer, 1998.
[2] A. Appice, M. Ceci, S. Rawles, and P. Flach, "Redundant Feature Elimination for Multi-Class Problems," Proc. 21st Int'l Conf. Machine Learning (ICML), 2004.
[3] A. Argyriou, T. Evgeniou, and M. Pontil, "Convex Multi-Task Feature Learning," Machine Learning, vol. 73, no. 3, pp. 243-272, 2008.
[4] M. Belkin and P. Niyogi, "Laplacian Eigenmaps for Dimensionality Reduction and Data Representation," Proc. Neural Information Processing Systems (NIPS), 2003.
[5] D. Cai, X. He, and J. Han, "Spectral Regression: A Unified Approach for Sparse Subspace Learning," Proc. IEEE Int'l Conf. Data Mining (ICDM), 2007.
[6] T. Cox et al., Multidimensional Scaling. Chapman and Hall, 2001.
[7] C. Ding and H. Peng, "Minimum Redundancy Feature Selection from Microarray Gene Expression Data," Proc. IEEE CS Conf. Bioinformatics (CSB), 2003.
[8] R. Duangsoithong, "Relevant and Redundant Feature Analysis with Ensemble Classification," Proc. Seventh Int'l Conf. Advances in Pattern Recognition (ICAPR), 2009.
[9] R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification, second ed. Wiley, 2001.
[10] J.G. Dy et al., "Unsupervised Feature Selection Applied to Content-Based Retrieval of Lung Images," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 25, no. 3, pp. 373-378, Mar. 2003.
[11] J.G. Dy and C.E. Brodley, "Feature Selection for Unsupervised Learning," J. Machine Learning Research, vol. 5, pp. 845-889, 2004.
[12] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, "Least Angle Regression," Annals of Statistics, vol. 32, pp. 407-449, 2004.
[13] G. Forman, "An Extensive Empirical Study of Feature Selection Metrics for Text Classification," J. Machine Learning Research, vol. 3, pp. 1289-1305, 2003.
[14] G.H. Golub and C.F. Van Loan, Matrix Computations, third ed. The Johns Hopkins Univ. Press, 1996.
[15] A. Gretton, O. Bousquet, A. Smola, and B. Scholkopf, "Measuring Statistical Dependence with Hilbert-Schmidt Norms," Proc. 16th Int'l Conf. Algorithmic Learning Theory (ALT), 2005.
[16] I. Guyon and A. Elisseeff, "An Introduction to Variable and Feature Selection," J. Machine Learning Research, vol. 3, pp. 1157-1182, 2003.
[17] M. Hall, "Correlation Based Feature Selection for Machine Learning," PhD thesis, Univ. of Waikato, Computer Science, 1999.
[18] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning. Springer, 2001.
[19] X. He, D. Cai, and P. Niyogi, "Laplacian Score for Feature Selection," Proc. Advances in Neural Information Processing Systems, vol. 18, 2005.
[20] X. He, D. Cai, S. Yan, and H.J. Zhang, "Neighborhood Preserving Embedding," Proc. Int'l Conf. Computer Vision (ICCV), 2005.
[21] K. Kira and L.A. Rendell, "A Practical Approach to Feature Selection," Proc. Ninth Int'l Workshop Machine Learning (ML '92), 1992.
[22] I. Kononenko, "Estimating Attributes: Analysis and Extension of RELIEF," Proc. European Conf. Machine Learning (ECML), 1994.
[23] H. Liu and H. Motoda, Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic Publishers, 1998.
[24] H. Liu and L. Yu, "Toward Integrating Feature Selection Algorithms for Classification and Clustering," IEEE Trans. Knowledge and Data Eng., vol. 17, no. 4, pp. 491-502, Apr. 2005.
[25] J. Liu, S. Ji, and J. Ye, "Multi-Task Feature Learning Via Efficient l2 1-Norm Minimization," Proc. Conf. Uncertainty in Artificial Intelligence (UAI), 2009.
[26] A. Nemirovski, "Efficient Methods in Convex Programming," Lecture Notes, 1994.
[27] Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Academic Publishers, 2003.
[28] F. Nie, S. Xiang, Y. Jia, C. Zhang, and S. Yan, "Trace Ratio Criterion for Feature Selection," Proc. Conf. Artificial Intelligence (AAAI), 2008.
[29] J.R. Quinlan, C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
[30] V. Roth and B. Fischer, "The Group-Lasso for Generalized Linear Models: Uniqueness of Solutions and Efficient Algorithms," Proc. 25th Int'l Conf. Machine Learning (ICML), 2008.
[31] S.T. Roweis and L.K. Saul, "Nonlinear Dimensionality Reduction by Locally Linear Embedding," Science, vol. 290, pp. 2323-2326, 2000.
[32] Y. Saeys et al., "A Review of Feature Selection Techniques in Bioinformatics," Bioinformatics, vol. 23, no. 19, pp. 2507-2517, 2007.
[33] L.K. Saul, K.Q. Weinberger, F. Sha, J. Ham, and D.D. Lee, Spectral Methods for Dimensionality Reduction, chapter 16, pp. 279-293. The MIT Press, 2006.
[34] B. Shaw and T. Jebara, "Structure Preserving Embedding," Proc. 26th Ann. Int'l Conf. Machine Learning (ICML), 2009.
[35] M.R. Sikonja and I. Kononenko, "Theoretical and Empirical Analysis of Relief and ReliefF," Machine Learning, vol. 53, pp. 23-69, 2003.
[36] L. Song, A. Smola, A. Gretton, J. Bedo, and K. Borgwardt, "Feature Selection via Dependence Maximization," J. Machine Learning Research, 2007.
[37] L. Sun, S. Ji, and J. Ye, "A Least Squares Formulation For a Class of Generalized Eigenvalue Problems in Machine Learning," Proc. 26th Ann. Int'l Conf. Machine Learning (ICML), 2009.
[38] J. Tenenbaum, V. de Silva, and J. Langford, "A Global Geometric Framework for Nonlinear Dimensionality Reduction," Science, vol. 290, no. 5500, pp. 2319-2323, 2000.
[39] U. von Luxburg, "A Tutorial on Spectral Clustering," technical report, Max Planck Inst. of Biological Cybernetics, 2007.
[40] R. Walpole and R. Myers, Probability and Statistics for Engineers and Scientists. Macmillan Publishing Company, 1993.
[41] K.Q. Weinberger, B.D. Packer, and L.K. Saul, "Nonlinear Dimensionality Reduction by Semidefinite Programming and Kernel Matrix Factorization," Proc. 10th Int'l Workshop AI and Statistics (AISTATS), 2005.
[42] J. Weston, A. Elisseff, B. Schoelkopf, and M. Tipping, "Use of the Zero Norm with Linear Models and Kernel Methods," J. Machine Learning Research, vol. 3, pp. 1439-1461, 2003.
[43] L. Xiao, J. Sun, and S. Boyd, "A Duality View of Spectral Methods for Dimensionality Reduction," Proc. 23rd Int'l Conf. Machine Learning (ICML), 2006.
[44] Z. Xu, R. Jin, M.R. Lyu, and I. King, "Discriminative Semi-Supervised Feature Selection via Manifold Regularization," Proc. 21st Int'l Joint Conf. Artificial Intelligence (IJCAI), 2009.
[45] S. Yan, D. Xu, B. Zhang, H. Zhang, Q. Yang, and S. Lin, "Graph Embedding and Extensions: A General Framework for Dimensionality Reduction," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 1, pp. 40-51, Jan. 2007.
[46] L. Yu and H. Liu, "Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution," Proc. 20th Int'l Conf. Machine Learning (ICML), 2003.
[47] M. Yuan and Y. Lin, "Model Selection and Estimation in Regression with Grouped Variables," J. Royal Statistical Soc. Series B, vol. 68, pp. 49-67, 2005.
[48] D. Zeimpekis and E. Gallopoulos, "Tmg: A Matlab Toolbox for Generating Term-Document Matrices from Text Collections," technical report, Univ. of Patras, Greece, 2005.
[49] Z. Zhao and H. Liu, "Spectral Feature Selection for Supervised and Unsupervised Learning," Proc. 24th Int'l Conf. Machine Learning (ICML), 2007.
[50] Z. Zhao and H. Liu, "Semi-Supervised Feature Selection via Spectral Analysis," Proc. Seventh SIAM Int'l Conf. Data Mining (SDM), 2007.
[51] L. Zhou, L. Wang, and C. Shen, "Feature Selection with Redundancy-Constrained Class Separability," IEEE Trans. Neural Networks, vol. 21, no. 5, pp. 853-858, May 2010.
15 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool