The Community for Technology Leaders
RSS Icon
Issue No.12 - December (2009 vol.21)
pp: 1665-1678
Fei Wang , Tsinghua University, Beijing
Changshui Zhang , Tsinghua University, Beijing
Tao Li , Florida International University, Miami
Clustering is an old research topic in data mining and machine learning. Most of the traditional clustering methods can be categorized as local or global ones. In this paper, a novel clustering method that can explore both the local and global information in the data set is proposed. The method, Clustering with Local and Global Regularization (CLGR), aims to minimize a cost function that properly trades off the local and global costs. We show that such an optimization problem can be solved by the eigenvalue decomposition of a sparse symmetric matrix, which can be done efficiently using iterative methods. Finally, the experimental results on several data sets are presented to show the effectiveness of our method.
Clustering, local learning, smoothness, regularization.
Fei Wang, Changshui Zhang, Tao Li, "Clustering with Local and Global Regularization", IEEE Transactions on Knowledge & Data Engineering, vol.21, no. 12, pp. 1665-1678, December 2009, doi:10.1109/TKDE.2009.40
[1] A. Argyriou, M. Herbster, and M. Pontil, “Combining Graph Laplacians for Semi-Supervised Learning,” Proc. Conf. Neural Information Processing Systems, 2005.
[2] M. Belkin and P. Niyogi, “Laplacian Eigenmaps for Dimensionality Reduction and Data Representation,” Neural Computation, vol. 15, no. 6, pp. 1373-1396, 2003.
[3] M. Belkin and P. Niyogi, “Semi-Supervised Learning on Riemannian Manifolds,” Machine Learning, vol. 56, pp. 209-239, 2004.
[4] M. Belkin and P. Niyogi, “Towards a Theoretical Foundation for Laplacian-Based Manifold Methods,” Proc. 18th Conf. Learning Theory (COLT), 2005.
[5] M. Belkin, P. Niyogi, and V. Sindhwani, “Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples,” J. Machine Learning Research, vol. 7, pp.2399-2434, 2006.
[6] L. Bottou and V. Vapnik, “Local Learning Algorithms,” Neural Computation, vol. 4, pp. 888-900, 1992.
[7] P.K. Chan, D.F. Schlag, and J.Y. Zien, “Spectral K-Way Ratio-Cut Partitioning and Clustering,” IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, vol. 13, no. 9, pp. 1088-1096, Sept. 1994.
[8] O. Chapelle, M. Chi, and A. Zien, “A Continuation Method for Semi-Supervised SVMs,” Proc. 23rd Int'l Conf. Machine Learning, pp. 185-192, 2006.
[9] J. Chen, Z. Zhao, J. Ye, and L. Huan, “Nonlinear Adaptive Distance Metric Learning for Clustering,” Proc. 13th ACM Special Interest Group Conf. Knowledge Discovery and Data Mining (SIGKDD), pp. 123-132, 2007.
[10] G. Dai and D.-Y. Yeung, “Kernel Selection for Semi-Supervised Kernel Machines,” Proc. 24th Int'l Conf. Machine Learning (ICML '07), pp. 185-192, 2007.
[11] C. Ding, X. He, H. Zha, M. Gu, and H.D. Simon, “A Min-Max Cut Algorithm for Graph Partitioning and Data Clustering,” Proc. First Int'l Conf. Data Mining (ICDM '01), pp. 107-114, 2001.
[12] R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification. John Wiley & Sons, 2001.
[13] A.S. Georghiades, P.N. Belhumeur, and D.J. Kriegman, “From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 6, pp. 643-660, June 2001.
[14] G.H. Golub and C.F. Van Loan, Matrix Computations, third ed. The Johns Hopkins Univ. Press, 1996.
[15] J. Han and M. Kamber, Data Mining. Morgan Kaufmann, 2001.
[16] M. Hein, J.Y. Audibert, and U. von Luxburg, “From Graphs to Manifolds—Weak and Strong Pointwise Consistency of Graph Laplacians,” Proc. 18th Conf. Learning Theory (COLT '05), pp. 470-485, 2005.
[17] J. He, M. Lan, C.-L. Tan, S.-Y. Sung, and H.-B. Low, “Initialization of Cluster Refinement Algorithms: A Review and Comparative Study,” Proc. Int'l Joint Conf. Neural Networks, 2004.
[18] A. Jain and R. Dubes, Algorithms for Clustering Data. Prentice-Hall, 1988.
[19] B. Kernighan and S. Lin, “An Efficient Heuristic Procedure for Partitioning Graphs,” The Bell System Technical J., vol. 49, no. 2, pp.291-307, 1970.
[20] L. Zelnik-Manor and P. Perona, “Self-Tuning Spectral Clustering,” Proc. Conf. Neural Information Processing Systems, 2005.
[21] A. McCallum, K. Nigam, J. Rennie, and K. Seymore, “Automating the Contruction of Internet Portals with Machine Learning,” Information Retrieval J., vol. 3, pp. 127-163, 2000.
[22] S.A. Nene, S.K. Nayar, and J. Murase, “Columbia Object Image Library (COIL-20),” Technical Report CUCS-005-96, Columbia Univ., Feb. 1996.
[23] A.Y. Ng, M.I. Jordan, and Y. Weiss, “On Spectral Clustering: Analysis and an Algorithm,” Proc. Conf. Neural Information Processing Systems, 2002.
[24] S.T. Roweis and L.K. Saul, “Noninear Dimensionality Reduction by Locally Linear Embedding,” Science, vol. 290, pp. 2323-2326, 2000.
[25] B. Schölkopf and A. Smola, Learning with Kernels. The MIT Press, 2002.
[26] J. Shawe-Taylor and N. Cristianini, Kernel Methods for Pattern Analysis. Cambridge Univ. Press, 2004.
[27] J. Shi and J. Malik, “Normalized Cuts and Image Segmentation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 888-905, Aug. 2000.
[28] T. Sim, S. Baker, and M. Bsat, “The CMU Pose, Illumination, and Expression Database,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 25, no. 12, pp. 1615-1618, Dec. 2003.
[29] A. Strehl and J. Ghosh, “Cluster Ensembles—A Knowledge Reuse Framework for Combining Multiple Partitions,” J. Machine Learning Research, vol. 3, pp. 583-617, 2002.
[30] V.N. Vapnik, The Nature of Statistical Learning Theory. Springer-Verlag, 1995.
[31] F. Wang, C. Zhang, and T. Li, “Regularized Clustering for Documents,” Proc. 30th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR '07), 2007.
[32] F. Wang and C. Zhang, “Label Propagation through Linear Neighborhoods,” Proc. 23rd Int'l Conf. Machine Learning, 2006.
[33] Y. Weiss, “Segmentation Using Eigenvectors: A Unifying View,” Proc. IEEE Int'l Conf. Computer Vision, pp. 975-982, 1999.
[34] M. Wu and B. Schölkopf, “A Local Learning Approach for Clustering,” Proc. Conf. Neural Information Processing Systems, 2006.
[35] M. Wu and B. Schölkopf, “Transductive Classification via Local Learning Regularization,” Proc. 11th Int'l Conf. Artificial Intelligence and Statistics (AISTATS '07), pp. 628-635, 2007.
[36] S.X. Yu and J. Shi, “Multiclass Spectral Clustering,” Proc. Int'l Conf. Computer Vision, 2003.
[37] L. Zelnik-Manor and P. Perona, “Self-Tuning Spectral Clustering,” Proc. Conf. Neural Information Processing Systems, 2005.
[38] H. Zha, X. He, C. Ding, M. Gu, and H. Simon, “Spectral Relaxation for K-Means Clustering,” Proc. Conf. Neural Information Processing Systems, 2001.
[39] T. Zhang and J.F. Oles, “Text Categorization Based on Regularized Linear Classification Methods,” Information Retrieval, vol. 4, pp. 5-31, 2001.
[40] D. Zhou and B. Schölkopf, “Learning from Labeled and Unlabeled Data Using Random Walks,” Proc. 26th DAGM Symp. Pattern Recognition, pp. 237-244, 2004.
[41] D. Zhou, O. Bousquet, T.N. Lal, J. Weston, and B. Schölkopf, “Learning with Local and Global Consistency,” Proc. Conf. Neural Information Processing Systems, 2004.
[42] X. Zhu, Z. Ghahramani, and J. Lafferty, “Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions,” Proc. 20th Int'l Conf. Machine Learning, 2003.
[43] X. Zhu, J. Lafferty, and Z. Ghahramani, “Semi-Supervised Learning: From Gaussian Fields to Gaussian Process,” Computer Science Technical Report CMU-CS-03-175, Carnegie Mellon Univ., 2003.
[44] X. Zhu and A. Goldberg, “Kernel Regression with Order Preferences,” Proc. 22nd AAAI Conf. Artificial Intelligence (AAAI), 2007.
54 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool