CSDL Home IEEE Transactions on Pattern Analysis & Machine Intelligence 2011 vol.33 Issue No.08 - August

Subscribe

Issue No.08 - August (2011 vol.33)

pp: 1532-1547

Hong Zeng , Southeast University, Nanjing and Hong Kong Baptist University, Hong Kong

Yiu-ming Cheung , Hong Kong Baptist University, Hong Kong

ABSTRACT

The performance of the most clustering algorithms highly relies on the representation of data in the input space or the Hilbert space of kernel methods. This paper is to obtain an appropriate data representation through feature selection or kernel learning within the framework of the Local Learning-Based Clustering (LLC) (Wu and Schölkopf 2006) method, which can outperform the global learning-based ones when dealing with the high-dimensional data lying on manifold. Specifically, we associate a weight to each feature or kernel and incorporate it into the built-in regularization of the LLC algorithm to take into account the relevance of each feature or kernel for the clustering. Accordingly, the weights are estimated iteratively in the clustering process. We show that the resulting weighted regularization with an additional constraint on the weights is equivalent to a known sparse-promoting penalty. Hence, the weights of those irrelevant features or kernels can be shrunk toward zero. Extensive experiments show the efficacy of the proposed methods on the benchmark data sets.

INDEX TERMS

High-dimensional data, local learning-based clustering, feature selection, kernel learning, sparse weighting.

CITATION

Hong Zeng, Yiu-ming Cheung, "Feature Selection and Kernel Learning for Local Learning-Based Clustering",

*IEEE Transactions on Pattern Analysis & Machine Intelligence*, vol.33, no. 8, pp. 1532-1547, August 2011, doi:10.1109/TPAMI.2010.215REFERENCES

- [1] S. Shortreed and M. Meila, "Unsupervised Spectral Learning,"
Proc. Conf. Uncertainty in Artificial Intelligence, pp. 534-541, 2005.- [2] M. Yuan and Y. Lin, "Model Selection and Estimation in Regression with Grouped Variables,"
J. Royal Statistical Soc. Series B, vol. 68, no. 1, pp. 49-67, 2006.- [3] M. Wu and B. Schölkopf, "A Local Learning Approach for Clustering,"
Advances in Neural Information Processing Systems, vol. 19, pp. 1529-1536, MIT Press, 2007.- [4] F. Wang, C.S. Zhang, and T. Li, "Regularized Clustering for Documents,"
Proc. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 95-102, 2007.- [5] I. Guyon and A. Elisseeff, "An Introduction to Variable and Feature Selection,"
J. Machine Learning Research, vol. 3, nos. 7/8, pp. 1157-1182, 2003.- [6] Y.M. Cheung and H. Zeng, "Local Kernel Regression Score for Selecting Features of High-Dimensional Data,"
IEEE Trans. Knowledge and Data Eng., vol. 21, no. 12, pp. 1798-1802, Dec. 2009.- [7] X. He, D. Cai, and P. Niyogi, "Laplacian Score for Feature Selection,"
Advances in Neural Information Processing Systems, vol. 18, pp. 507-514, MIT Press, 2005.- [8] Z. Zhao and H. Liu, "Spectral Feature Selection for Supervised and Unsupervised Learning,"
Proc. Int'l Conf. Machine Learning, pp. 1151-1158, 2007.- [9] M. Dash, K. Choi, P. Scheuermann, and H. Liu, "Feature Selection for Clustering—A Filter Solution,"
Proc. IEEE Int'l Conf. Data Mining, pp. 115-122, 2002.- [10] J.G. Dy and C.E. Brodley, "Feature Selection for Unsupervised Learning,"
J. Machine Learning Research, vol. 5, pp. 845-889, 2004.- [11] M.H.C. Law, A.K. Jain, and M.A.T. Figueiredo, "Feature Selection in Mixture-Based Clustering,"
Advances in Neural Information Processing Systems, vol. 15, pp. 609-616, MIT Press, 2003.- [12] V. Roth and T. Lange, "Feature Selection in Clustering Problems,"
Advances in Neural Information Processing Systems, vol. 16, pp. 473-480, MIT Press, 2004.- [13] L. Wolf and A. Shashua, "Feature Selection for Unsupervised and Supervised Inference: The Emergence of Sparsity in a Weight-Based Approach,"
J. Machine Learning Research, vol. 6, pp. 1855-1887, 2005.- [14] A. Ng, M. Jordan, and Y. Weiss, "On Spectral Clustering: Analysis and an Algorithm,"
Advances in Neural Information Processing Systems, vol. 14, pp. 849-856, MIT Press, 2002.- [15] S.X. Yu and J. Shi, "Multiclass Spectral Clustering,"
Proc. IEEE Int'l Conf. Computer Vision, pp. 313-319, 2003.- [16] A. Argyriou, T. Evgeniou, and M. Pontil, "Multi-Task Feature Learning,"
Advances in Neural Information Processing Systems, pp. 41-48, MIT Press, 2007.- [17] H. Zha, C. Ding, M. Gu, X. He, and H. Simon, "Spectral Relaxation for K-Means Clustering,"
Advances in Neural Information Processing Systems, vol. 14, pp. 1057-1064, MIT Press, 2001.- [18] C.H. Papadimitriou and K. Steiglitz,
Combinatorial Optimization: Algorithm and Complexity. Dover, 1998.- [19] D.J. Newman, S. Hettich, C.L. Blake, and C.J. Merz, UCI Repository of Machine Learning Databases, http://www.ics.uci. edu/~mlearnMLRepository.html , 1998.
- [20] M. West, C. Blanchette, H. Dressman, E. Huang, S. Ishida, R. Spang, H. Zuzan, J.A. Olson,Jr., J.R. Marks, and J.R. Nevins, "Predicting the Clinical Status of Human Breast Cancer by Using Gene Expression Profiles,"
Proc. Nat'l Academy of Sciences USA, vol. 98, no. 20, pp. 11462-11467, 2001.- [21] J. Khan et al., "Classification and Diagnostic Prediction of Cancers Using Gene Expression Profiling and Artificial Neural Networks,"
Nature Medicine, vol. 7, pp. 673-679, 2001.- [22] U. Alon, N. Barkai, D.A. Notterman, K. Gish, S. Ybarra, D. Mack, and A.J. Levine, "Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays,"
Proc. Nat'l Academy of Sciences USA, vol. 96, no. 12, pp. 6745-6750, 1999.- [23] T.R. Golub et al., "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring,"
Science, vol. 286, no. 5439, pp. 531-537, 1999.- [24] L. Zelnik-Manor and P. Perona, "Self-Tuning Spectral Clustering,"
Advances in Neural Information Processing Systems, vol. 17, pp. 1601-1608, MIT Press, 2005.- [25] T. Lange and J. Buhmann, "Fusion of Similarity Data in Clustering,"
Advances in Neural Information Processing Systems, vol. 18, pp. 723-730, MIT Press, 2006.- [26] G.R.G. Lanckriet, N. Cristianini, P. Bartlett, M.I.E. Ghaoui, and M.I. Jordan, "Learning the Kernel Matrix with Semidefinite Programming,"
J. Machine Learning Research, vol. 5, pp. 27-72, 2004.- [27] J. Ye, S. Ji, and J. Chen, "Multi-Class Discriminant Kernel Learning via Convex Programming,"
J. Machine Learning Research, vol. 9, pp. 719-758, 2008.- [28] F.R. Bach and M.I. Jordan, "Learning Spectral Clustering, with Application to Speech Separation,"
J. Machine Learning Research, vol. 7, pp. 1963-2001, 2006.- [29] O. Chapelle and V. Vapnik, "Model Selection for Support Vector Machines,"
Advances in Neural Information Processing Systems, vol. 12, pp. 230-236, MIT Press, 2000.- [30] B. Scholköpf and A.J. Smola,
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, 2002.- [31] E.K.P. Chong and S.H. Zak,
An Introduction to Optimization. John Wiley and Sons Inc., 2001.- [32] A. Rakotomamonjy, F. Bach, S. Canu, and Y. Grandvalet, "More Efficiency in Multiple Kernel Learning,"
Proc. Int'l Conf. Machine Learning, pp. 775-782, 2007.- [33] A. Rakotomamonjy, F. Bach, Y. Grandvalet, and S. Canu, "SimpleMKL,"
J. Machine Learning Research, vol. 9, pp. 2491-2521, 2008.- [34] H. Valizadegan and R. Jin, "Generalized Maximum Margin Clustering and Unsupervised Kernel Learning,"
Advances in Neural Information Processing Systems, pp. 1417-1424, MIT Press, 2007.- [35] S. Ullman, M. Vidal-Naquet, and E. Sali, "Visual Features of Intermediate Complexity and Their Use in Classification,"
Nature Neuroscience, vol. 5, no. 7, pp. 683-687, 2002.- [36] J.F. Bonnans and A. Sharpiro,
Perturbation Analysis of Optimization Problems. Springer, 2000.- [37] O. Chapelle, V. Vanpnik, O. Bousquet, and S. Mukherjee, "Choosing Multiple Parameters for Support Vector Machines,"
Machine Learning, vol. 26, no. 1, pp. 131-159, 2002.- [38] P.H. Calamai and J.J. Moré, "Projected Gradients Methods for Linearly Constrained Problems,"
Math. Programming, vol. 39, no. 1, pp. 93-116, 1987.- [39] D.Y. Zhou and C.J.C. Burges, "Spectral Clustering and Transductive Learning with Multiple Views,"
Proc. Int'l Conf. Machine Learning, pp. 1159-1166, 2007.- [40] P. Mitra, C.A. Murthy, and S.K. Pal, "Unsupervised Feature Selection Using Feature Similarity,"
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 3, pp. 301-312, Mar. 2002.- [41] J. Chen, Z. Zhao, J. Ye, and H. Liu, "Nonlinear Adaptive Distance Metric Learning for Clustering,"
Proc. ACM SIGKDD, pp. 123-132, 2007.- [42] D.Y. Yeung, H. Chang, and G. Dai, "Learning the Kernel Matrix by Maximizing a KFD-Based Class Separability Criterion,"
Pattern Recognition, vol. 40, no. 7, pp. 2021-2028, 2007.- [43] M.H.C. Law, M.A.T. Figueiredo, and A.K. Jain, "Simultaneous Feature Selection and Clustering Using Mixture Models,"
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 9, pp. 1154-1166, Sept. 2004.- [44] G. BakIr, M. Wu, and J. Eichhorn, "Maximum-Margin Feature Combination for Detection and Categorization," technical report, Max Planck Inst. for Biological Cybernetics, 2005.
- [45] G. Karypis, CLUTO-A Clustering Toolkit, http://www-users.cs. umn.edu/~karypiscluto /, 2002.
- [46] B. Long, P.S. Yu, and M.Z.F. Zhang, "General Model for Multiple View Unsupervised Learning,"
Proc. SIAM Int'l Conf. Data Mining, pp. 822-833, 2008.- [47] A. Argyriou, T. Evgeniou, and M. Pontil, "Convex Multi-Task Feature Learning,"
Machine Learning, vol. 73, no. 3, pp. 243-272, 2008.- [48] C.A. Micchelli and M. Pontil, "Learning the Kernel Function via Regularization,"
J. Machine Learning Research, vol. 6, pp. 1099-1125, 2005.- [49] C.A. Micchelli and M. Pontil, "Feature Space Perspectives for Learning the Kernel,"
Machine Learning, vol. 66, no. 2, pp. 297-319, 2007. |