The Community for Technology Leaders
RSS Icon
Issue No.10 - October (2011 vol.33)
pp: 2013-2025
Xiaofei He , Zhejiang University, Hangzhou
Ming Ji , University of Illinois at Urbana-Champaign, Urbana
Chiyuan Zhang , Zhejiang University, Hangzhou
Hujun Bao , Zhejiang University, Hangzhou
In many information processing tasks, one is often confronted with very high-dimensional data. Feature selection techniques are designed to find the meaningful feature subset of the original features which can facilitate clustering, classification, and retrieval. In this paper, we consider the feature selection problem in unsupervised learning scenarios, which is particularly difficult due to the absence of class labels that would guide the search for relevant information. Based on Laplacian regularized least squares, which finds a smooth function on the data manifold and minimizes the empirical loss, we propose two novel feature selection algorithms which aim to minimize the expected prediction error of the regularized regression model. Specifically, we select those features such that the size of the parameter covariance matrix of the regularized regression model is minimized. Motivated from experimental design, we use trace and determinant operators to measure the size of the covariance matrix. Efficient computational schemes are also introduced to solve the corresponding optimization problems. Extensive experimental results over various real-life data sets have demonstrated the superiority of the proposed algorithms.
Feature selection, dimensionality reduction, manifold, regularization, regression, clustering.
Xiaofei He, Ming Ji, Chiyuan Zhang, Hujun Bao, "A Variance Minimization Criterion to Feature Selection Using Laplacian Regularization", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.33, no. 10, pp. 2013-2025, October 2011, doi:10.1109/TPAMI.2011.44
[1] A.C. Atkinson and A.N. Donev, Optimum Experimental Designs. Oxford Univ. Press, 2007.
[2] S. Basu, C.A. Micchelli, and P. Olsen, "Maximum Entropy and Maximum Likelihood Criteria for Feature Selection from Multivariate Data," Proc. IEEE Int'l Symp. Circuits and Systems, 2000.
[3] M. Belkin and P. Niyogi, "Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering," Advances in Neural Information Processing Systems 14, pp. 585-591, MIT Press, 2001.
[4] M. Belkin, P. Niyogi, and V. Sindhwani, "Manifold Regularization: A Geometric Framework for Learning from Examples," J. Machine Learning Research, vol. 7, pp. 2399-2434, 2006.
[5] T.D. Bie, "Deploying sdp for Machine Learning," Proc. 15th European Symp. Artificial Neural Networks, Apr. 2007.
[6] S. Boutemedjet, N. Bouguila, and D. Ziou, "A Hybrid Feature Extraction Selection Approach for High-Dimensional Non-Gaussian Data Clustering," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 8, pp. 1429-1443, Aug. 2009.
[7] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge Univ. Press, 2004.
[8] F.R.K. Chung, Spectral Graph Theory, vol. 92 of Regional Conf. Series in Math. AMS, 1997.
[9] M. Dash and H. Liu, "Unsupervised Feature Selection," Proc. Pacific Asia Conf. Knowledge Discovery and Data Mining, 2000.
[10] R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification, second ed. Wiley-Interscience, 2000.
[11] J.G. Dy and C.E. Brodley, "Feature Subset Selection and Order Identification for Unsupervised Leanring," Proc. 17th Int'l Conf. Machine Learning, 2000.
[12] P. Flaherty, M.I. Jordan, and A.P. Arkin, "Robust Design of Biological Experiments," Advances in Neural Information Processing Systems 18, MIT Press, 2005.
[13] G.H. Golub and C.F.V. Loan, Matrix Computations, third ed. Johns Hopkins Univ. Press, 1996.
[14] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, "Gene Selection for Cancer Classification Using Support Vector Machines," Machine Learning, vol. 46, pp. 389-422, 2002.
[15] D.A. Harville, Matrix Algebra from a Statistician's Perspective. Springer-Verlag, 1997.
[16] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer-Verlag, 2001.
[17] X. He, D. Cai, and P. Niyogi, "Laplacian Score for Feature Selection," Advances in Neural Information Processing Systems 18, MIT Press, 2005.
[18] X. He, M. Ji, and H. Bao, "A Unified Active and Semi-Supervised Learning Framework for Image Compression," Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, 2009.
[19] X. Li, S. Lin, S. Yan, and D. Xu, "Discriminant Locally Linear Embedding with High-Order Tensor Data," IEEE Trans. Systems, Man, and Cybernetics, Part B, vol. 38, no. 2, pp. 342-352, Apr. 2008.
[20] X. Li and Y. Pang, "Deterministic Column-Based Matrix Decomposition," IEEE Trans. Knowledge and Data Eng., vol. 22, no. 1, pp. 145-149, Jan. 2010.
[21] W. Liu, D. Tao, and J. Liu, "Transductive Component Analysis," Proc. IEEE Int'l Conf. Data Mining, 2008.
[22] L. Lovasz and M. Plummer, Matching Theory. Akadémiai Kiadó, 1986.
[23] M. Robnik-Sikonja and I. Kononenko, "Theoretical and Empirical Analysis of Relief and Relieff," Machine Learning, vol. 53, nos. 1/2, pp. 23-69, 2003.
[24] S. Roweis and L. Saul, "Nonlinear Dimensionality Reduction by Locally Linear Embedding," Science, vol. 290, no. 5500, pp. 2323-2326, 2000.
[25] J. Sturm, "Using Sedumi 1.02, a Matlab Toolbox for Optimization over Symmetric Cones," Optimization Methods and Software, vol. 11, nos. 1-4, pp. 625-653, 1999.
[26] D. Tao, X. Li, X. Wu, and S.J. Maybank, "General Averaged Divergence Analysis," Proc. IEEE Int'l Conf. Data Mining, 2007.
[27] D. Tao, X. Li, X. Wu, and S.J. Maybank, "Geometric Mean for Subspace Selection," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 260-274, Feb. 2009.
[28] J. Tenenbaum, V. de Silva, and J. Langford, "A Global Geometric Framework for Nonlinear Dimensionality Reduction," Science, vol. 290, no. 5500, pp. 2319-2323, 2000.
[29] K.C. Toh, M.J. Todd, and R.H. Tütüncü, "Sdpt3—A Matlab Software Package for Semidefinite Programming," Optimization Methods and Software, vol. 11, nos. 1-4, pp. 545-581, 1999.
[30] L. Vandenberghe, S. Boyd, and S.-P. Wu, "Determinant Maximization with Linear Matrix Inequality Constraints," SIAM J. Matrix Analysis and Applications, vol. 19, no. 2, pp. 499-533, 1998.
[31] D. Ververidis and C. Kotropoulos, "Information Loss of the Mahalanobis Distance in High Dimensions: Application to Feature Selection," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 12, pp. 2275-2281, Dec. 2009.
[32] L. Wolf and A. Shashua, "Feature Selection for Unsupervised and Supervised Inference: The Emergence of Sparsity in a Weight-Based Approach," J. Machine Learning Research, vol. 6, pp. 1855-1887, 2005.
[33] W. Xu, X. Liu, and Y. Gong, "Document Clustering Based on Non-Negative Matrix Factorization," Proc. 2003 Int'l Conf. Research and Development in Information Retrieval, pp. 267-273, Aug. 2003.
[34] J. Zhao, K. Lu, and X. He, "Locality Sensitive Semi-Supervised Feature Selection," Neurocomputing, vol. 71, nos. 10-12, pp. 1842-1849, 2008.
[35] Z. Zhao and H. Liu, "Spectral Feature Selection for Supervised and Unsupervised Learning," Proc. 24th Int'l Conf. Machine Learning, 2007.
23 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool