Issue No.05 - May (2013 vol.25)

pp: 1056-1069

Xinhai Liu , Credit Reference Center & Financial Res. Inst., People's Bank of China, Beijing, China

Shuiwang Ji , Dept. of Comput. Sci., Old Dominion Univ., Norfolk, VA, USA

Wolfgang Glänzel , Dept. of MSI, Katholieke Univ. Leuven, Leuven, Belgium

B. De Moor , Dept. of Electr. Eng., Katholieke Univ. Leuven, Leuven, Belgium

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2012.95

ABSTRACT

Clustering by integrating multiview representations has become a crucial issue for knowledge discovery in heterogeneous environments. However, most prior approaches assume that the multiple representations share the same dimension, limiting their applicability to homogeneous environments. In this paper, we present a novel tensor-based framework for integrating heterogeneous multiview data in the context of spectral clustering. Our framework includes two novel formulations; that is multiview clustering based on the integration of the Frobenius-norm objective function (MC-FR-OI) and that based on matrix integration in the Frobenius-norm objective function (MC-FR-MI). We show that the solutions for both formulations can be computed by tensor decompositions. We evaluated our methods on synthetic data and two real-world data sets in comparison with baseline methods. Experimental results demonstrate that the proposed formulations are effective in integrating multiview data in heterogeneous environments.

INDEX TERMS

Tensile stress, Tin, Vectors, Clustering algorithms, Optimization, Kernel, Matrix decomposition, higher order orthogonal iteration, Multiview clustering, tensor decomposition, spectral clustering, multilinear singular value decomposition

CITATION

Xinhai Liu, Shuiwang Ji, Wolfgang Glänzel, B. De Moor, "Multiview Partitioning via Tensor Methods",

*IEEE Transactions on Knowledge & Data Engineering*, vol.25, no. 5, pp. 1056-1069, May 2013, doi:10.1109/TKDE.2012.95REFERENCES

- [1] H.G. Ayad and M.S. Kamel, "Cumulative Voting Consensus Method for Partitions with Variable Number of Clusters,"
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 1, pp. 160-173, Jan. 2008.- [2] B.W. Bader and T.G. Kolda, "MATLAB Tensor Toolbox Version 2.4," http://csmr.ca.sandia.gov/tgkoldaTensorToolbox /, Mar. 2010.
- [3] S. Bickel and T. Scheffer, "Multi-View Clustering,"
Proc. IEEE Fourth Int'l Conf. Data Mining (ICDM '04), pp. 19-26, 2004.- [4] J.D. Carroll and J.J. Chang, "Analysis of Individual Differences in Multidimensional Scaling via an $n$ -Way Generalization of 'Echart-Young,' Decomposition,"
Psychometricka, vol. 35, pp. 283-319, 1970.- [5] K. Chaudhuri, S.M. Kakade, K. Livescu, and K. Sridharan, "Multi-View Clustering Via Canonical Correlation Analysis,"
Proc. 26th Ann. Int'l Conf. Machine Learning (ICML '09), pp. 129-136, 2009.- [6] A. Cichocki, R. Zdunek, A.-H. Phan, and S. Amari,
Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-Way Data Analysis and Blind Source Separation. John Wiley, 2009.- [7] L. De Lathauwer, B.D. Moor, and J. Vandewalle, "A Multilinear Singular Value Decomposition,"
SIAM J. Matrix Analysis and Applications, vol. 21, no. 4, pp. 1253-1278, 2000.- [8] L. De Lathauwer, B.D. Moor, and J. Vandewalle, "On the Best Rank-1 and Rank-$(r_{1},r_{2},\ldots,r_n)$ Approximation of Higher-Order Tensors,"
SIAM J. Matrix Analysis and Applications, vol. 21, no. 4, pp. 1324-1342, 2000.- [9] D.M. Dunlavy, T.G. Kolda, and E. Acar, "Poblano v1.0: A MATLAB Toolbox for Gradient-Based Optimization," Technical Report SAND2010-1422, Sandia Nat'l Laboratories, Mar. 2010.
- [10] D.M. Dunlavy, T.G. Kolda, and W.P. Kegelmeyer, "Multilinear Algebra for Analyzing Data with Multiple Linkages," Technical Report SAND2006-2079, Sandia Nat'l Laboratories, 2006.
- [11] L. Eldén and B. Savas, "A Newton-Grassmann Method for Computing the Best Multilinear Rank-(${r}_{1},r_{2},r_{3}$ ) Approximation of a Tensor,"
SIAM J. Matrix Analysis and Applications, vol. 31, pp. 248-271, 2009.- [12] L. Eldén and B. Savas, "Perturbation Theory and Optimality Conditions for the Best Multilinear Rank Approximation of a Tensor,"
SIAM. J. Matrix Analysis and Applications, vol. 32, pp. 1422-1450, 2011.- [13] G.H. Golub and C.F. Van Loan,
Matrix Computations, third ed. The Johns Hopkins Univ. Press, 1996.- [14] R.A. Harshman, "Foundations of the PARAFAC Procedure: Model and Conditions for an 'Explanatory' Multi-Modal Factor Analysis,"
UCLA Working Papers in Phonetics, vol. 16, pp. 1-84, 1970.- [15] H. Huang, C. Ding, D. Luo, and T. Li, "Simultaneous Tensor Subspace Selection and Clustering: The Equivalence of High Order SVD and $k$ -Means Clustering,"
Proc. 14th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 327-335, 2008.- [16] L. Hubert and P. Arabie, "Comparing Partitions,"
J. Classification, vol. 2, no. 1, pp. 193-218, 1985.- [17] M. Ishteva, P.-A. Absil, S. Van Huffel, and L. De Lathauwer, "Tucker Compression and Local Optima,"
Chemometrics and Intelligent Laboratory Systems, vol. 106, no. 1, pp. 57-64, 2011.- [18] M. Ishteva, L. De Lathauwer, P.-A. Absil, and S. Van Huffel, "Best Low Multilinear Rank Approximation of Higher-Order Tensors, Based on the Riemannian Trust-Region Scheme,"
SIAM J. Matrix Analysis and Applications, vol. 32, no. 1, pp. 115-135, 2011.- [19] M. Ishteva, L.D. Lathauwer, P.-A. Absil, and S.V. Huffel, "Differential-Geometric Newton Algorithm for the Best Rank-$(r_{1},r_{2},r_{3})$ Approximation of Tensors,"
Numerical Algorithms, vol. 51, no. 2, pp. 179-194, 2009.- [20] T. Joachims, N. Cristianini, and J. Shawe-Taylor, "Composite Kernels for Hypertext Categorisation,"
Proc. 18th Int'l Conf. Machine Learning (ICML '01), pp. 250-257, 2001.- [21] T. Kolda and B. Bader, "The TOPHITS Model for Higher-Order Web Link Analysis,"
Proc. SIAM Data Mining Conf. Workshop Link Analysis, Counterterrorism and Security, 2006.- [22] T.G. Kolda and B.W. Bader, "Tensor Decompositions and Applications,"
SIAM Rev., vol. 51, no. 3, pp. 455-500, 2009.- [23] P. Kroonenberg and J. de Leeuw, "Principal Component Analysis of Three-Mode Data by Means of Alternating Least Squares Algorithms,"
Psychometrika, 1980.- [24] P.M. Kroonenberg,
Applied Multiway Data Analysis. Wiley, 2008.- [25] X. Liu, S. Yu, Y. Moreau, B. De Moor, W. Glänzel, and F. Janssens, "Hybrid Clustering of Text Mining and Bibliometrics Applied to Journal Sets,"
Proc. SIAM Int'l Conf. Data Mining, pp. 49-60, 2009.- [26] B. Long, P.S. Yu, and Z.M. Zhang, "A General Model for Multiple View Unsupervised Learning,"
Proc. SIAM Int'l Conf. Data Mining, pp. 822-833, 2008.- [27] B. Long, Z.M. Zhang, X. Wú, and P.S. Yu, "Spectral Clustering for Multi-Type Relational Data,"
Proc. 23rd Int'l Conf. Machine Learning, pp. 585-592, 2006.- [28] U. Luxburg, "A Tutorial on Spectral Clustering,"
Statistics and Computing, vol. 17, no. 4, pp. 395-416, 2007.- [29] P.J. Mucha, T. Richardson, K. Macon, M.A. Porter, and J.-P. Onnela, "Community Structure in Time-Dependent, Multiscale, and Multiplex Networks,"
Science, vol. 328, no. 5980, pp. 876-878, 2010.- [30] A. Ng, M. Jordan, and Y. Weiss, "On Spectral Clustering: Analysis and an Algorithm,"
Advances in Neural Information Processing Systems, pp. 849-856, MIT Press, 2001.- [31] M.L. Overton and R.S. Womersley, "Optimality Conditions and Duality Theory for Minimizing Sums of the Largest Eigenvalues of Symmetric Matrices,"
Math. Programming, vol. 62, no. 2, pp. 321-357, 1993.- [32] B. Savasa and L. Eldéna, "Krylov-Type Methods for Tensor Computations,"
Linear Algebra and Its Applications, vol. 438, no. 2, pp. 891-918, 2010.- [33] B. Savas and L.-H. Lim, "Quasi-Newton Methods on Grassmannians and Multilinear Approximations of Tensors,"
SIAM J. Scientific Computing, vol. 32, pp. 3352-3393, 2010.- [34] T.M. Selee, T.G. Kolda, W.P. Kegelmeyer, and J.D. Griffin, "Extracting Clusters from Large Datasets with Multiple Similarity Measures Using IMSCAND," M.L. Parks and S.S. Collis, eds.,
CSRI Summer Proc. 2007, Technical Report SAND2007-7977, Sandia Nat'l Laboratories, pp. 87-103, 2007.- [35] J. Shi and J. Malik, "Normalized Cuts and Image Segmentation,"
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 888-905, Aug. 2000.- [36] A. Smilde, R. Bro, and P. Geladi,
Multi-Way Analysis: Applications in the Chemical Sciences. Wiley, 2004.- [37] A. Strehl and J. Ghosh, "Cluster Ensembles-A Knowledge Reuse Framework for Combining Multiple Partitions,"
J. Machine Learning Research, vol. 3, pp. 583-617, 2002.- [38] J. Sun, D. Tao, and C. Faloutsos, "Beyond Streams and Graphs: Dynamic Tensor Analysis,"
Proc. 12th ACM SIGKDD Int'l conf. Knowledge Discovery and Data mining, pp. 374-383, 2006.- [39] L. Tang, X. Wang, and H. Liu, "Uncovering Groups via Heterogeneous Interaction Analysis,"
Proc. IEEE Ninth Int'l Conf. Data Mining, pp. 143-152, 2009.- [40] L. Tang, X. Wang, and H. Liu, "Community Detection in Multi-Dimensional Networks," technical report, School of Computing, Informatics, and Decision Systems Eng., Arizona State Univ., 2010.
- [41] W. Tang, Z. Lu, and I.S. Dhillon, "Clustering with Multiple Graphs,"
Proc. IEEE Ninth Int'l Conf. Data Mining, pp. 1016-1021, 2009.- [42] L. Tucker, "The Extension of Factor Analysis to Three-Dimensional Matrices,"
Contributions to Mathematical Psychology, H. Gulliksen and N. Frederiksen, eds., pp. 109-127, Holt, Rinehart & Winston, 1964.- [43] L. Tucker, "Some Mathematical Notes on Three-Mode Factor Analysis,"
Psychometrika, vol. 31, pp. 279-311, 1966.- [44] D. Verma and M. Meila, "A Comparison of Spectral Clustering Algorithms," technical report, Dept. of CSE Univ. of Washington Seattle, WA, 2003.
- [45] J. Ye, "Generalized Low Rank Approximations of Matrices,"
Machine Learning, vol. 61, pp. 167-191, 2005.- [46] J. Ye, R. Janardan, and Q. Li, "GPCA: An Efficient Dimension Reduction Scheme for Image Compression and Retrieval,"
Proc. 10th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, 2004.- [47] D. Zhou and C.J.C. Burges, "Spectral Clustering and Transductive Learning with Multiple Views,"
Proc. 24th Int'l Conf. Machine Learning, pp. 1159-1166, 2007. |