This Article 
 Bibliographic References 
 Add to: 
Optimized Data Fusion for Kernel k-Means Clustering
May 2012 (vol. 34 no. 5)
pp. 1031-1039
Shi Yu, Katholieke Universiteit Leuven, Leuven
Léon-Charles Tranchevent, Katholieke Universiteit Leuven, Leuven
Xinhai Liu, Katholieke Universiteit Leuven, Leuven
Wolfgang Glänzel, Katholieke Universiteit Leuven, Leuven
Johan A.K. Suykens, Katholieke Universiteit Leuven, Leuven
Bart De Moor, Katholieke Universiteit Leuven, Leuven
Yves Moreau, Katholieke Universiteit Leuven, Leuven
This paper presents a novel optimized kernel k-means algorithm (OKKC) to combine multiple data sources for clustering analysis. The algorithm uses an alternating minimization framework to optimize the cluster membership and kernel coefficients as a nonconvex problem. In the proposed algorithm, the problem to optimize the cluster membership and the problem to optimize the kernel coefficients are all based on the same Rayleigh quotient objective; therefore the proposed algorithm converges locally. OKKC has a simpler procedure and lower complexity than other algorithms proposed in the literature. Simulated and real-life data fusion applications are experimentally studied, and the results validate that the proposed algorithm has comparable performance, moreover, it is more efficient on large-scale data sets. (The Matlab implementation of OKKC algorithm is downloadable from

[1] E.D. Andersen and K.D. Andersen, "The MOSEK Interior Point Optimizer for Linear Programming: An Implementation of the Homogeneous Algorithm," High Performance Optimization, pp. 197-232, 2000.
[2] H.G. Ayad and M.S. Kamel, "Cumulative Voting Consensus Method for Partitions with a Variable Number of Clusters," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 1, pp. 160-173, Jan. 2008.
[3] R. Bhatia, Matrix Analysis. Springer, 1997.
[4] C.M. Bishop, Pattern Recognition and Machine Learning. Springer, 2006.
[5] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge Univ. Press, 2004.
[6] G. Baudat and F. Anouar, "Generalized Discriminant Analysis Using a Kernel Approach," Nerual Computation, vol. 12, no. 10, pp. 2385-2404, 2000.
[7] K. Chaudhuri, S.M. Kakade, K. Livescu, and K. Sridharan, "Multi-View Clustering via Canonical Correlation Analysis" Proc. 26th Ann. Int'l Conf. Machine Learning, 2009.
[8] J. Chen, Z. Zhao, J. Ye, and H. Liu, "Nonlinear Adaptive Distance Metric Learning for Clustering," Proc. 13th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, 2007.
[9] I. Csiszar and G. Tusnady, "Information Geometry and Alternating Minimization Procedures," Statistics and Decisions, supplement 1, pp. 205-237, 1984.
[10] L. De Lathauwer, B.D. Moor, and J. Vandewalle, "On the Best Rank-1 and Rank-($r_{1}$ , $r_{2}$ ,..., $r_{n}$ ) Approximation of Higher-Order Tensors," SIAM J. Matrix Analysis Application, vol. 21, no. 4, pp. 1324-1342, 2000.
[11] I.S. Dhillon, Y. Guan, and B. Kulis, "Kernel k-Means, Spectral Clustering, and Normalized Cuts," Proc. 10th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 551-556, 2004.
[12] C. Ding and X. He, "K-Means Clustering via Principal Component Analysis," Proc. 21st Int'l Conf. Machine Learning, pp. 225-232, 2004.
[13] C. Ding and X. He, "Linearized Cluster Assignment via Spectral Ordering," Proc. 21st Int'l Conf. Machine Learning, 2004.
[14] R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification, second ed. John Wiley & Sons, 2001.
[15] A.L.N. Fred and A.K. Jain, "Combining Multiple Clusterings Using Evidence Accumulation," IEEE Trans. Pattern Analysis Machine and Intelligence, vol. 27, no. 6, pp. 835-850, June 2005.
[16] M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to NP-Completeness. W.H. Freeman, 1979.
[17] M. Girolami, "Mercer Kernel-Based Clustering in Feature Space," IEEE Trans. Neural Networks, vol. 13, no. 3, pp. 780-784, May 2002.
[18] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, second ed. Springer, 2009.
[19] R. Hettich and K.O. Kortanek, "Semi-Infinite Programming: Theory, Methods, and Applications," SIAM Rev., vol. 35., no. 3, pp. 380-429, 1993.
[20] P. Howload and H. Park, "Generalizing Discriminant Analysis Using the Generalized Singular Value Decomposition," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 8, pp. 995-1006, Aug. 2004.
[21] L. Hubert and P. Arabie, "Comparing Partitions," J. Classification, vol. 2, no. 1, pp. 193-218, 1985.
[22] A.K. Jain and R.C. Dubes, Algorithms for Clustering Data. Prentice Hall, 1988.
[23] M. Kloft, U. Brefeld, S. Sonnenburg, P. Laskow, K.R. Mueller, and A. Zien, "Efficient and Accurate $L_{p}$ -Norm MKL," Proc. Advances in Neural Information Processing Systems, pp. 997-1005, 2009.
[24] G. Lanckriet, N. Cristianini, P. Bartlett, L.E. Ghaoui, and M.I. Jordan, "Learning the Kernel Matrix with Semidefinite Programming," J. Machine Learning Research, vol. 5, pp. 27-72, 2004.
[25] T. Lange and J.M. Buhmann, "Fusion of Similarity Data in Clustering," Proc. Advances Neural Information Processing Systems, 2005.
[26] Y. Liang, C. Li, W. Gong, and Y. Pan, "Uncorrelated Linear Discriminant Analysis Based on Weighted Pairwise Fisher Criterion," Pattern Recognition, vol. 40, pp. 3606-3615, 2007.
[27] H. Lu, K.N. Plataniotis, and A.N. Venetsanopoulos, "Uncorrelated Multilinear Discriminant Analysis with Regularization and Aggregation for Tensor Object Recognition," IEEE Trans. Neural Networks, vol. 20, no. 1, pp. 103-123, Jan. 2009.
[28] X. Liu, S. Yu, Y. Moreau, B. De Moor, W. Glänzel, and F. Janssens, "Hybrid Clustering of Text Mining and Bibliometrics Applied to Journal Sets," Proc. SIAM Int'l Conf. Data Mining, 2009.
[29] J. Ma, J.L. Sancho-Gómez, and S.C. Ahalt, "Nonlinear Multiclass Discriminant Analysis," IEEE Signal Processing Letters, vol. 10, no. 7, pp. 196-199, July 2003.
[30] D.J.C. MacKay, Information Theory, Inference, and Learning Algorithms. Cambridge Univ., 2003.
[31] S. Mika, G. Rätsch, J. Weston, and B. Schölkopf, "Fisher Discriminant Analysis with Kernels," Proc. IEEE Signal Processing Soc. Workshop Neural Networks for Signal Processing IX, pp. 41-48, 1999.
[32] C.H. Park and H. Park, "Efficient Nonlinear Dimension Reduction for Clustered Data Using Kernel Functions," Proc. IEEE Third Int'l Conf. Data Mining, pp. 243-250, 2003.
[33] J. Shawe-Taylor and N. Cristianin, Kernel Methods for Pattern Analysis. Cambridge Univ. Press, 2004.
[34] G. Sanguinetti, "Dimensionality Reduction of Clustered Data Sets," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 3, pp. 535-540, Mar. 2008.
[35] B. Schölkopf, A. Smola, and K.R. Müller, "Nonlinear Component Analysis as a Kernel Eigenvalue Problem," J. Neural Computation, vol. 10, pp. 1299-1319, 1998.
[36] B. Schölkopf, R. Herbrich, and A.J. Smola, "A Generalized Representer Theorem," Proc. 14th Ann. Conf. Computational Learning Theory and Fifth European Conf. Computational Learning Theory, pp. 416-426, 2001.
[37] S. Sonnenburg, G. Rätsch, C. Schäfer, and B. Schölkopf, "Large Scale Multiple Kernel Learning," J. Machine Learning Research, vol. 7, pp. 1531-1565, 2006.
[38] G.W. Stewart and J.G. Sun, Matrix Perturbation Theory. Academic Press, 1999.
[39] J.A.K. Suykens, T. Van Gestel, J. De Brabanter, B. De Moor, and J. Vandewalle, Least Squares Support Vector Machines. World Scientific, 2002.
[40] A. Strehl and J. Ghosh, "Clustering Ensembles: A Knowledge Reuse Framework for Combining Multiple Partitions," J. Machine Learning Research, vol. 3, pp. 583-617, 2002.
[41] W. Tang, Z. Lu, and I.S. Dhillon, "Clustering with Multiple Graphs," Proc. IEEE Ninth Int'l Conf. Data Mining, 2009.
[42] S. Theodoridis and K. Koutroumbas, Pattern Recognition, fourth ed. Academic Press, 2009.
[43] A. Topchy, A.K. Jain, and W. Punch, "Clustering Ensembles: Models of Consensus and Weak Partitions," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 12, pp. 1866-1881, Dec. 2005.
[44] U. von Luxburg, "A Tutorial on Spectral Clustering," Statistics and Computing, vol. 17, no. 4, pp. 395-416, 2007.
[45] J. Ye, Z. Zhao, and M. Wu, "Discriminative K-Means for Clustering," Proc. Advances in Neural Information Processing Systems, 2007.
[46] J.P. Ye, S.W. Ji, and J.H. Chen, "Multi-Class Discriminant Kernel Learning via Convex Programming," J. Machine Learning Research, vol. 9, pp. 719-758, 2008.
[47] S. Yu, L.-C. Tranchevent, B. De Moor, and Y. Moreau, "Gene Prioritization and Clustering by Multi-View Text Mining," BMC Bioinformatics, vol. 11, no. 28, pp. 1-48, 2010.
[48] S. Yu, T. Falck, A. Daemen, L.C. Tranchevent, J. Suykens, B. De Moor, and Y. Moreau, "$L_{2}$ -Norm Multiple Kernel Learning and Its Application to Biomedical Data Fusion," BMC Bioinformatics, vol. 11, no. 309, pp. 1-53, 2010.
[49] H. Zha, C. Ding, M. Gu, X. He, and H. Simon, "Spectral Relaxation for K-Means Clustering," Proc. Advances in Nerual Information Processing, vol. 14, pp. 1057-1064, 2001.
[50] D. Zhou and C.J.C. Burges, "Spectral Clustering and Transductive Learning with Mulitple Views," Proc. 24th Int'l Conf. Machine Learning, 2007.
[51] G.K. Zipf, Human Behaviour and the Principle of Least Effort, An Introduction to Human Ecology. Addison-Wesley, 1949.

Index Terms:
Clustering, data fusion, multiple kernel learning, Fisher discriminant analysis, least-squares support vector machine.
Shi Yu, Léon-Charles Tranchevent, Xinhai Liu, Wolfgang Glänzel, Johan A.K. Suykens, Bart De Moor, Yves Moreau, "Optimized Data Fusion for Kernel k-Means Clustering," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 5, pp. 1031-1039, May 2012, doi:10.1109/TPAMI.2011.255
Usage of this product signifies your acceptance of the Terms of Use.