The Community for Technology Leaders
RSS Icon
Issue No.05 - May (2012 vol.34)
pp: 1031-1039
Shi Yu , Katholieke Universiteit Leuven, Leuven
Léon-Charles Tranchevent , Katholieke Universiteit Leuven, Leuven
Xinhai Liu , Katholieke Universiteit Leuven, Leuven
Wolfgang Glänzel , Katholieke Universiteit Leuven, Leuven
Johan A.K. Suykens , Katholieke Universiteit Leuven, Leuven
Bart De Moor , Katholieke Universiteit Leuven, Leuven
Yves Moreau , Katholieke Universiteit Leuven, Leuven
This paper presents a novel optimized kernel k-means algorithm (OKKC) to combine multiple data sources for clustering analysis. The algorithm uses an alternating minimization framework to optimize the cluster membership and kernel coefficients as a nonconvex problem. In the proposed algorithm, the problem to optimize the cluster membership and the problem to optimize the kernel coefficients are all based on the same Rayleigh quotient objective; therefore the proposed algorithm converges locally. OKKC has a simpler procedure and lower complexity than other algorithms proposed in the literature. Simulated and real-life data fusion applications are experimentally studied, and the results validate that the proposed algorithm has comparable performance, moreover, it is more efficient on large-scale data sets. (The Matlab implementation of OKKC algorithm is downloadable from
Clustering, data fusion, multiple kernel learning, Fisher discriminant analysis, least-squares support vector machine.
Shi Yu, Léon-Charles Tranchevent, Xinhai Liu, Wolfgang Glänzel, Johan A.K. Suykens, Bart De Moor, Yves Moreau, "Optimized Data Fusion for Kernel k-Means Clustering", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.34, no. 5, pp. 1031-1039, May 2012, doi:10.1109/TPAMI.2011.255
[1] E.D. Andersen and K.D. Andersen, "The MOSEK Interior Point Optimizer for Linear Programming: An Implementation of the Homogeneous Algorithm," High Performance Optimization, pp. 197-232, 2000.
[2] H.G. Ayad and M.S. Kamel, "Cumulative Voting Consensus Method for Partitions with a Variable Number of Clusters," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 1, pp. 160-173, Jan. 2008.
[3] R. Bhatia, Matrix Analysis. Springer, 1997.
[4] C.M. Bishop, Pattern Recognition and Machine Learning. Springer, 2006.
[5] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge Univ. Press, 2004.
[6] G. Baudat and F. Anouar, "Generalized Discriminant Analysis Using a Kernel Approach," Nerual Computation, vol. 12, no. 10, pp. 2385-2404, 2000.
[7] K. Chaudhuri, S.M. Kakade, K. Livescu, and K. Sridharan, "Multi-View Clustering via Canonical Correlation Analysis" Proc. 26th Ann. Int'l Conf. Machine Learning, 2009.
[8] J. Chen, Z. Zhao, J. Ye, and H. Liu, "Nonlinear Adaptive Distance Metric Learning for Clustering," Proc. 13th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, 2007.
[9] I. Csiszar and G. Tusnady, "Information Geometry and Alternating Minimization Procedures," Statistics and Decisions, supplement 1, pp. 205-237, 1984.
[10] L. De Lathauwer, B.D. Moor, and J. Vandewalle, "On the Best Rank-1 and Rank-($r_{1}$ , $r_{2}$ ,..., $r_{n}$ ) Approximation of Higher-Order Tensors," SIAM J. Matrix Analysis Application, vol. 21, no. 4, pp. 1324-1342, 2000.
[11] I.S. Dhillon, Y. Guan, and B. Kulis, "Kernel k-Means, Spectral Clustering, and Normalized Cuts," Proc. 10th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 551-556, 2004.
[12] C. Ding and X. He, "K-Means Clustering via Principal Component Analysis," Proc. 21st Int'l Conf. Machine Learning, pp. 225-232, 2004.
[13] C. Ding and X. He, "Linearized Cluster Assignment via Spectral Ordering," Proc. 21st Int'l Conf. Machine Learning, 2004.
[14] R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification, second ed. John Wiley & Sons, 2001.
[15] A.L.N. Fred and A.K. Jain, "Combining Multiple Clusterings Using Evidence Accumulation," IEEE Trans. Pattern Analysis Machine and Intelligence, vol. 27, no. 6, pp. 835-850, June 2005.
[16] M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to NP-Completeness. W.H. Freeman, 1979.
[17] M. Girolami, "Mercer Kernel-Based Clustering in Feature Space," IEEE Trans. Neural Networks, vol. 13, no. 3, pp. 780-784, May 2002.
[18] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, second ed. Springer, 2009.
[19] R. Hettich and K.O. Kortanek, "Semi-Infinite Programming: Theory, Methods, and Applications," SIAM Rev., vol. 35., no. 3, pp. 380-429, 1993.
[20] P. Howload and H. Park, "Generalizing Discriminant Analysis Using the Generalized Singular Value Decomposition," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 8, pp. 995-1006, Aug. 2004.
[21] L. Hubert and P. Arabie, "Comparing Partitions," J. Classification, vol. 2, no. 1, pp. 193-218, 1985.
[22] A.K. Jain and R.C. Dubes, Algorithms for Clustering Data. Prentice Hall, 1988.
[23] M. Kloft, U. Brefeld, S. Sonnenburg, P. Laskow, K.R. Mueller, and A. Zien, "Efficient and Accurate $L_{p}$ -Norm MKL," Proc. Advances in Neural Information Processing Systems, pp. 997-1005, 2009.
[24] G. Lanckriet, N. Cristianini, P. Bartlett, L.E. Ghaoui, and M.I. Jordan, "Learning the Kernel Matrix with Semidefinite Programming," J. Machine Learning Research, vol. 5, pp. 27-72, 2004.
[25] T. Lange and J.M. Buhmann, "Fusion of Similarity Data in Clustering," Proc. Advances Neural Information Processing Systems, 2005.
[26] Y. Liang, C. Li, W. Gong, and Y. Pan, "Uncorrelated Linear Discriminant Analysis Based on Weighted Pairwise Fisher Criterion," Pattern Recognition, vol. 40, pp. 3606-3615, 2007.
[27] H. Lu, K.N. Plataniotis, and A.N. Venetsanopoulos, "Uncorrelated Multilinear Discriminant Analysis with Regularization and Aggregation for Tensor Object Recognition," IEEE Trans. Neural Networks, vol. 20, no. 1, pp. 103-123, Jan. 2009.
[28] X. Liu, S. Yu, Y. Moreau, B. De Moor, W. Glänzel, and F. Janssens, "Hybrid Clustering of Text Mining and Bibliometrics Applied to Journal Sets," Proc. SIAM Int'l Conf. Data Mining, 2009.
[29] J. Ma, J.L. Sancho-Gómez, and S.C. Ahalt, "Nonlinear Multiclass Discriminant Analysis," IEEE Signal Processing Letters, vol. 10, no. 7, pp. 196-199, July 2003.
[30] D.J.C. MacKay, Information Theory, Inference, and Learning Algorithms. Cambridge Univ., 2003.
[31] S. Mika, G. Rätsch, J. Weston, and B. Schölkopf, "Fisher Discriminant Analysis with Kernels," Proc. IEEE Signal Processing Soc. Workshop Neural Networks for Signal Processing IX, pp. 41-48, 1999.
[32] C.H. Park and H. Park, "Efficient Nonlinear Dimension Reduction for Clustered Data Using Kernel Functions," Proc. IEEE Third Int'l Conf. Data Mining, pp. 243-250, 2003.
[33] J. Shawe-Taylor and N. Cristianin, Kernel Methods for Pattern Analysis. Cambridge Univ. Press, 2004.
[34] G. Sanguinetti, "Dimensionality Reduction of Clustered Data Sets," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 3, pp. 535-540, Mar. 2008.
[35] B. Schölkopf, A. Smola, and K.R. Müller, "Nonlinear Component Analysis as a Kernel Eigenvalue Problem," J. Neural Computation, vol. 10, pp. 1299-1319, 1998.
[36] B. Schölkopf, R. Herbrich, and A.J. Smola, "A Generalized Representer Theorem," Proc. 14th Ann. Conf. Computational Learning Theory and Fifth European Conf. Computational Learning Theory, pp. 416-426, 2001.
[37] S. Sonnenburg, G. Rätsch, C. Schäfer, and B. Schölkopf, "Large Scale Multiple Kernel Learning," J. Machine Learning Research, vol. 7, pp. 1531-1565, 2006.
[38] G.W. Stewart and J.G. Sun, Matrix Perturbation Theory. Academic Press, 1999.
[39] J.A.K. Suykens, T. Van Gestel, J. De Brabanter, B. De Moor, and J. Vandewalle, Least Squares Support Vector Machines. World Scientific, 2002.
[40] A. Strehl and J. Ghosh, "Clustering Ensembles: A Knowledge Reuse Framework for Combining Multiple Partitions," J. Machine Learning Research, vol. 3, pp. 583-617, 2002.
[41] W. Tang, Z. Lu, and I.S. Dhillon, "Clustering with Multiple Graphs," Proc. IEEE Ninth Int'l Conf. Data Mining, 2009.
[42] S. Theodoridis and K. Koutroumbas, Pattern Recognition, fourth ed. Academic Press, 2009.
[43] A. Topchy, A.K. Jain, and W. Punch, "Clustering Ensembles: Models of Consensus and Weak Partitions," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 12, pp. 1866-1881, Dec. 2005.
[44] U. von Luxburg, "A Tutorial on Spectral Clustering," Statistics and Computing, vol. 17, no. 4, pp. 395-416, 2007.
[45] J. Ye, Z. Zhao, and M. Wu, "Discriminative K-Means for Clustering," Proc. Advances in Neural Information Processing Systems, 2007.
[46] J.P. Ye, S.W. Ji, and J.H. Chen, "Multi-Class Discriminant Kernel Learning via Convex Programming," J. Machine Learning Research, vol. 9, pp. 719-758, 2008.
[47] S. Yu, L.-C. Tranchevent, B. De Moor, and Y. Moreau, "Gene Prioritization and Clustering by Multi-View Text Mining," BMC Bioinformatics, vol. 11, no. 28, pp. 1-48, 2010.
[48] S. Yu, T. Falck, A. Daemen, L.C. Tranchevent, J. Suykens, B. De Moor, and Y. Moreau, "$L_{2}$ -Norm Multiple Kernel Learning and Its Application to Biomedical Data Fusion," BMC Bioinformatics, vol. 11, no. 309, pp. 1-53, 2010.
[49] H. Zha, C. Ding, M. Gu, X. He, and H. Simon, "Spectral Relaxation for K-Means Clustering," Proc. Advances in Nerual Information Processing, vol. 14, pp. 1057-1064, 2001.
[50] D. Zhou and C.J.C. Burges, "Spectral Clustering and Transductive Learning with Mulitple Views," Proc. 24th Int'l Conf. Machine Learning, 2007.
[51] G.K. Zipf, Human Behaviour and the Principle of Least Effort, An Introduction to Human Ecology. Addison-Wesley, 1949.
26 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool