Subscribe
Issue No.01 - January (2009 vol.21)
pp: 6-20
Evrim Acar , Rensselaer Polytechnic Institute, Troy
Bülent Yener , Rensselaer Polytechnic Institute, Troy
ABSTRACT
Two-way arrays or matrices are often not enough to represent all the information in the data and standard two-way analysis techniques commonly applied on matrices may fail to find the underlying structures in multi-modal datasets. Multiway data analysis has recently become popular as an exploratory analysis tool in discovering the structures in higher-order datasets, where data have more than two modes. We provide a review of significant contributions in the literature on multiway models, algorithms as well as their applications in diverse disciplines including chemometrics, neuroscience, social network analysis, text mining and computer vision.
INDEX TERMS
Introductory and Survey, Singular value decomposition, Mining methods and algorithms, Models
CITATION
Evrim Acar, Bülent Yener, "Unsupervised Multiway Data Analysis: A Literature Survey", IEEE Transactions on Knowledge & Data Engineering, vol.21, no. 1, pp. 6-20, January 2009, doi:10.1109/TKDE.2008.112
REFERENCES
 [1] F.L. Hitchcock, “The Expression of a Tensor or a Polyadic as a Sum of Products,” J. Math. and Physics, vol. 6, no. 1, pp. 164-189, 1927. [2] F.L. Hitchcock, “Multiple Invariants and Generalized Rank of a p-Way Matrix or Tensor,” J. Math. and Physics, vol. 7, pp. 39-79, 1927. [3] F. Miwakeichi, E. Martínez-Montes, P. Valdés-Sosa, N. Nishiyama, H. Mizuhara, and Y. Yamaguchi, “Decomposing EEG Data into Space-Time-Frequency Components Using Parallel Factor Analysis,” NeuroImage, vol. 22, no. 3, pp. 1035-1045, 2004. [4] E. Acar, S.A. Camtepe, M. Krishnamoorthy, and B. Yener, “Modeling and Multiway Analysis of Chatroom Tensors,” Proc. IEEE Int'l Conf. Intelligence and Security Informatics (ISI '05), pp.256-268, 2005. [5] F. Estienne, N. Matthijs, D.L. Massart, P. Ricoux, and D. Leibovici, “Multi-Way Modelling of High-Dimensionality Electroencephalographic Data,” Chemometrics and Intelligent Laboratory Systems, vol. 58, no. 1, pp. 59-72, 2001. [6] S. Gourvénec, I. Stanimirova, C.A. Saby, C.Y. Airiau, and D.L. Massart, “Monitoring Batch Processes with the STATIS Approach,” J. Chemometrics, vol. 19, no. 5-7, pp. 288-300, 2005. [7] P.A. Chew, B.W. Bader, T.G. Kolda, and A. Abdelali, “Cross-Language Information Retrieval Using PARAFAC2,” Proc. 13th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '07), pp. 143-152, 2007. [8] L.R. Tucker, “Implications of Factor Analysis to Three-Way Matrices of Measurement of Change,” Problems in Measuring Change; pp. 122-137, The Univ. of Wisconsin Press, 1963. [9] L.R. Tucker, “The Extension of Factor Analysis to Three-Dimensional Matrices,” Contributions to Math. Psychology; pp.110-182, Holt, Rinehart and Winston, 1964. [10] B.W. Bader and T.G. Kolda, “Algorithm 862: MATLAB Tensor Classes for Fast Algorithm Prototyping,” ACM Trans. Math. Software, vol. 32, no. 4, pp. 635-653, 2006. [11] H.A.L. Kiers, “Towards a Standardized Notation and Terminology in Multiway Analysis,” J. Chemometrics, vol. 14, no. 3, pp. 105-122, 2000. [12] L. de Lathauwer, B. de Moor, and J. Vandewalle, “A Multilinear Singular Value Decomposition,” SIAM J. Matrix Analysis and Applications, vol. 21, no. 4, pp. 1253-1278, 2000. [13] P.M. Kroonenberg and J. de Leeuw, “Principal Component Analysis of Three-Mode Data by Means of Alternating Least Squares Algorithms,” Psychometrika, vol. 45, no. 1, pp. 69-97, 1980. [14] G.H. Golub and C.F. van Loan, Matrix Computations. The Johns Hopkins Univ. Press, 1996. [15] J.B. Kruskal, “Rank Decomposition, and Uniqueness for 3-way and n-Way Arrays,” Multiway Data Analysis, pp. 8-18, Elsevier, 1989. [16] R. Bro, “PARAFAC. Tutorial and Applications,” Chemometrics and Intelligent Laboratory Systems, vol. 38, no. 2, pp. 149-171, 1997. [17] J. Möcks, “Decomposing Event-Related Potentials: A New Topographic Components Model,” Biological Psychology, vol. 26, nos. 1-3, pp. 199-215, 1988. [18] R.A. Harshman, “Foundations of the PARAFAC Procedure: Models and Conditions for an ‘Explanatory’ Multi-Modal Factor Analysis,” UCLA Working Papers in Phonetics, no. 16, pp. 1-84, 1970. [19] L.R. Tucker, “Some Mathematical Notes on Three-Mode Factor Analysis,” Psychometrika, vol. 31, pp. 279-311, 1966. [20] C.M. Andersen and R. Bro, “Practical Aspects of PARAFAC Modelling of Fluorescence Excitation-Emission Data,” J. Chemometrics, vol. 17, no. 4, pp. 200-215, 2003. [21] E. Acar, C.A. Bingöl, H. Bingöl, R. Bro, and B. Yener, “Multiway Analysis of Epilepsy Tensors,” Bioinformatics, vol. 23, no. 13, pp.i10-i18, 2007. [22] M. de Vos, A. Vergult, L. de Lathauwer, W. de Clercq, S. van Huffel, P. Dupont, A. Palmini, and W. van Paesschen, “Canonical Decomposition of Ictal Scalp EEG Reliably Detects the Seizure Onset Zone,” NeuroImage, vol. 37, no. 3, pp. 844-854, 2007. [23] J.D. Carroll and J. Chang, “Analysis of Individual Differences in Multidimensional Scaling via an n-Way Generalization of “Eckart-Young” Decomposition,” Psychometrika, vol. 35, no. 3, pp. 218-319, 1970. [24] R.B. Cattell, “Parallel Proportional Profiles and Other Principles for Determining the Choice of Factors by Rotation,” Psychometrika, vol. 9, no. 4, pp. 267-283, 1944. [25] R. Bro and H.A.L. Kiers, “A New Efficient Method for Determining the Number of Components in PARAFAC Models,” J.Chemometrics, vol. 17, no. 5, pp. 274-286, 2003. [26] R.A. Harshman, “PARAFAC2: Mathematical and Technical Notes,” UCLA Working Papers in Phonetics, vol. 22, pp. 30-44, 1972. [27] R. Bro, C.A. Andersson, and H.A.L. Kiers, “PARAFAC2—Part II: Modeling Chromatographic Data with Retention Time Shifts,” J.Chemometrics, vol. 13, nos. 3-4, pp. 295-309, 1999. [28] I. Stanimirova, B. Walczak, D.L. Massart, V. Simeonov, C.A. Saby, and E. di Crescenzo, “STATIS, A Three-Way Method for Data Analysis. Application to Environmental Data,” Chemometrics and Intelligent Laboratory Systems, vol. 73, no. 2, pp. 219-233, 2004. [29] R.A. Harshman, S. Hong, and M.E. Lundy, “Shifted Factor Analysis—Part I: Models and Properties,” J. Chemometrics, vol. 17, no. 7, pp. 363-378, 2003. [30] S. Hong and R.A. Harshman, “Shifted Factor Analysis—Part III: n-Way Generalization and Application,” J. Chemometrics, vol. 17, no. 7, pp. 389-399, 2003. [31] M. Mørup and M.N. Schmidt, “Sparse Non-Negative Tensor 2DDeconvolution (SNTF2D) for Multichannel Time-Frequency Analysis,” technical report, Technical Univ. of Denmark, DTU, 2006. [32] R. Bro, R.A. Harshman, and N.D. Sidiropoulos, “Modeling Multi-Way Data with Linearly Dependent Loadings,” Technical Report 2005-176, KVL, 2005. [33] A. Kapteyn, H. Neudecker, and T. Wansbeek, “An Approach to n-Mode Components Analysis,” Psychometrika, vol. 51, no. 2, pp.269-275, 1986. [34] M.E. Timmerman and H.A.L. Kiers, “Three Mode Principal Components Analysis: Choosing the Numbers of Components and Sensitivity to Local Optima,” British J. Math. and Statistical Psychology, vol. 53, no. 1, pp. 1-16, 2000. [35] H.A.L. Kiers and A. der Kinderen, “A Fast Method for Choosing the Numbers of Components in Tucker3 Analysis,” British J. Math. and Statistical Psychology, vol. 56, no. 1, pp. 119-125, 2003. [36] E. Ceulemans and H.A.L. Kiers, “Selecting among Three-Mode Principal Component Models of Different Types and Complexities: A Numerical Convex-Hull Based Method,” British J. Math. and Statistical Psychology, vol. 59, no. 1, pp. 133-150, 2006. [37] L.H. Lim, “Singular Values and Eigenvalues of Tensors: A Variational Approach,” Proc. IEEE Int'l Workshop Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP '05), pp.129-132, 2005. [38] M.A.O. Vasilescu and D. Terzopoulos, “Multilinear Analysis of Image Ensembles: Tensorfaces,” Proc. Seventh European Conf. Computer Vision (ECCV '02), vol. 2350, pp. 447-460, 2002. [39] M.A.O. Vasilescu and D. Terzopoulos, “Multilinear Image Analysis for Facial Recognition,” Proc. 16th Int'l Conf. Pattern Recognition (ICPR '02), vol. 2, pp. 511-514, 2002. [40] L. de Lathauwer, B. de Moor, and J. Vandewalle, “On the Best Rank-1 and ${\rm Rank}\hbox{-}(r_{1}, r_{2}, \ldots, r_{N})$ Approximation of Higher-Order Tensors,” SIAM J. Matrix Analysis and Applications, vol. 21, no. 4, pp. 1324-1342, 2000. [41] T.G. Kolda, “Orthogonal Tensor Decompositions,” SIAM J. Matrix Analysis and Applications, vol. 23, no. 1, pp. 243-255, 2001. [42] A. Levy and M. Lindenbaum, “Sequential Karhunen-Loeve Basis Extraction and Its Applications to Images,” IEEE Trans. Image Processing, vol. 9, no. 8, pp. 1371-1374, 2000. [43] T.G. Kolda, “Counterexample to the Possibility of an Extension of the Eckart-Young Low-Rank Approximation Theorem for Orthogonal Rank Tensor Decomposition,” SIAM J. Matrix Analysis and Applications, vol. 24, no. 3, pp. 762-767, 2003. [44] T. Zhang and G.H. Golub, “Rank-One Approximation to High Order Tensors,” SIAM J. Matrix Analysis and Applications, vol. 23, no. 2, pp. 534-550, 2001. [45] P. Paatero, “The Multilinear Engine—A Table-Driven, Least Squares Program for Solving Multilinear Problems, Including the n-Way Parallel Factor Analysis Model,” J. Computational and Graphical Statistics, vol. 8, no. 4, pp. 854-888, 1999. [46] A.K. Smilde, J.A. Westerhuis, and R. Boqué, “Multiway Multiblock Component and Covariates Regression Models,” J. Chemometrics, vol. 14, no. 3, pp. 301-331, 2000. [47] A. Carlier, C. Lavit, M. Pages, M. Pernin, and J. Turlot, “AComparative Review of Methods, Which Handle a Set ofIndexedData Tables,” Multiway Data Analysis, pp.85-101, Elsevier, 1989. [48] H. Kargupta, W. Huang, K. Sivakumar, and E. Johnson, “Distributed Clustering Using Collective Principal Component Analysis,” Knowledge and Information Systems J., vol. 3, no. 4, pp.422-448, 2001. [49] A.K. Smilde, R. Bro, and P. Geladi, “Multi-Way Analysis,” Applications in the Chemical Sciences, Wiley, 2004. [50] J.T. Sun, H.J. Zeng, H. Liu, Y. Lu, and Z. Chen, “CubeSVD: A Novel Approach to Personalized Web Search,” Proc. 14th Int'l World Wide Web Conf. (WWW '05), pp. 382-390, 2005. [51] M.A.O. Vasilescu and D. Terzopoulos, “Multilinear Subspace Analysis of Image Ensembles,” Proc. Int'l Conf. Computer Vision and Pattern Recognition (CVPR '03), pp. 93-99, 2003. [52] H. Wang and N. Ahuja, “A Tensor Approximation Approach to Dimensionality Reduction,” Int'l J. Computer Vision, vol. 76, no. 3, pp. 217-229, 2008. [53] P.D. Turney, “Empirical Evaluation of Four Tensor Decomposition Algorithms,” Technical Report ERB-1152, Nat'l Research Council, Inst. for Information Tech nology, 2007. [54] N.M. Faber, R. Bro, and P.K. Hopke, “Recent Developments in CANDECOMP/PARAFAC Algorithms: A Critical Review,” Chemometrics and Intelligent Laboratory Systems, vol. 65, no. 1, pp. 119-137, 2003. [55] J.H. Jiang, H.L. Wu, Y. Li, and R.Q. Yu, “Three-Way Data Resolution by Alternating Slice-Wise Diagonalization (ASD) Method,” J. Chemometrics, vol. 14, no. 1, pp. 15-36, 2000. [56] Z.P. Chen, H.L. Wu, J.H. Jiang, Y. Li, and R.Q. Yu, “A Novel Trilinear Decomposition Algorithm for Second-Order Linear Calibration,” Chemometrics and Intelligent Laboratory Systems, vol. 52, no. 1, pp. 75-86, 2000. [57] G. Tomasi and R. Bro, “A Comparison of Algorithms for Fitting the PARAFAC Model,” Computational Statistics and Data Analysis, vol. 50, no. 7, pp. 1700-1734, 2006. [58] P. Paatero, “A Weighted Non-Negative Least Squares Algorithm for Three-Way “PARAFAC” Factor Analysis,” Chemometrics and Intelligent Laboratory Systems, vol. 38, no. 2, pp. 223-242, 1997. [59] G. Tomasi and R. Bro, “PARAFAC and Missing Values,” Chemometrics and Intelligent Laboratory Systems, vol. 75, no. 2, pp.163-180, 2005. [60] T.G. Kolda, B.W. Bader, and J.P. Kenny, “Higher-Order Web Link Analysis Using Multilinear Algebra,” Proc. Fifth IEEE Int'l Conf. Data Mining (ICDM '05), pp. 242-249, 2005. [61] R. Bro and A.K. Smilde, “Centering and Scaling in Component Analysis,” J. Chemometrics, vol. 17, no. 1, pp. 16-33, 2003. [62] R. Bro, “Multi-Way Analysis in the Food Industry: Models, Algorithms, and Applications,” PhD dissertation, Univ. of Amsterdam, 1998. [63] R. Bro, “Review on Multiway Analysis in Chemistry—2000-2005,” Critical Rev. in Analytical Chemistry, vol. 36, nos. 3-4, pp. 279-293, 2006. [64] H.W. Cole and W.J. Ray, “EEG Correlates of Emotional Tasks Related to Attentional Demands,” Int'l J. Psychophysiology, vol. 3, no. 1, pp. 33-41, 1985. [65] A.S. Field and D. Graupe, “Topographic Component (Parallel Factor) Analysis of Multichannel Evoked Potentials: Practical Issues in Trilinear Spatiotemporal Decomposition,” Brain Topography, vol. 3, no. 4, pp. 407-423, 1991. [66] M. Mørup, L.K. Hansen, C.S. Hermann, J. Parnas, and S.M. Arnfred, “Parallel Factor Analysis as an Exploratory Tool for Wavelet Transformed Event-Related EEG,” NeuroImage, vol. 29, no. 3, pp. 938-947, 2006. [67] M. Mørup, L.K. Hansen, and S.M. Arnfred, “ERPWAVELAB a Toolbox for Multi-Channel Analysis of Time-Frequency Transformed Event Related Potentials,” J. Neuroscience Methods, vol. 161, no. 2, pp. 361-368, 2007. [68] A.H. Andersen and W.S. Rayens, “Structure-Seeking Multilinear Methods for the Analysis of fMRI Data,” NeuroImage, vol. 22, no. 2, pp. 728-739, 2004. [69] E. Martinez-Montes, P.A. Valdes-Sosa, F. Miwakeichi, R.I. Goldman, and M.S. Cohen, “Concurrent EEG/fMRI Analysis by Multiway Partial Least Squares,” NeuroImage, vol. 22, no. 3, pp.1023-1034, 2004. [70] E. Acar, C.A. Bingöl, H. Bingöl, and B. Yener, “Computational Analysis of Epileptic Focus Localization,” Proc. Fourth IASTED Int'l Conf. Biomedical Eng., pp. 317-322, 2006. [71] E. Acar, S.A. Camtepe, and B. Yener, “Collective Sampling and Analysis of High Order Tensors for Chatroom Communications,” Proc. IEEE Int'l Conf. Intelligence and Security Informatics (ISI '06), pp. 213-224, 2006. [72] B.W. Bader, R.A. Harshman, and T.G. Kolda, “Temporal Analysis of Semantic Graphs Using ASALSAN,” Proc. Seventh IEEE Int'l Conf. Data Mining (ICDM '07), pp. 33-42, 2007. [73] T.G. Kolda and B.W. Bader, “The Tophits Model for Higher-Order Web Link Analysis,” Proc. Workshop Link Analysis, Counterterrorism and Security, 2006. [74] J. Yang, D. Zhang, A.F. Frangi, and J. Yang, “Two-Dimensional PCA: A New Approach to Appearance-Based Face Representation and Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 1, pp. 131-137, Jan. 2004. [75] C. Ding and J. Ye, “Two-Dimensional Singular Value Decomposition for 2D Maps and Images,” Proc. SIAM Int'l Conf. Data Mining (SDM '05), pp. 32-43, 2005. [76] J. Ye, “Generalized Low Rank Approximation of Matrices,” Machine Learning, vol. 61, nos. 1-3, pp. 167-191, 2005. [77] H. Wang and N. Ahuja, “Compact Representation of Multidimensional Data Using Tensor Rank-One Decomposition,” Proc. 17th Int'l Conf. Pattern Recognition (ICPR '04), vol. 1, pp. 44-47, 2004. [78] X. Meng, A.J. Morris, and E.B. Martin, “On-Line Monitoring of Batch Processes Using a PARAFAC Representation,” J. Chemometrics, vol. 17, no. 1, pp. 65-81, 2003. [79] C.A. Andersson and R. Bro, “The n-Way Toolbox for MATLAB,” Chemometrics and Intelligent Laboratory Systems, vol. 52, no. 1, pp. 1-4, 2000. [80] B.W. Bader and T.G. Kolda, MATLAB Tensor Toolbox Version 2.2, http://csmr.ca.sandia.gov/~tgkoldaTensorToolbox , 2007. [81] PLS_Toolbox, Eigenvector Research Inc., http:/www.eigenvector. com/, 2007. [82] S. Gourvénec, G. Tomasi, C. Durvillec, E. di Crescenzo, C.A. Saby, D.L. Massart, R. Bro, and G. Oppenheim, “CuBatch, A MATLAB Interface for n-Mode Data Analysis,” Chemometrics and Intelligent Laboratory Systems, vol. 77, nos. 1-2, pp. 122-130, 2005. [83] B.W. Bader and T.G. Kolda, “Efficient MATLAB Computations with Sparse and Factored Tensors,” SIAM J. Scientific Computing, vol. 30, no. 1, pp. 205-231, 2006. [84] T.G. Kolda and B.W. Bader, “Tensor Decompositions and Applications,” Technical Report SAND2007-6702, Sandia Nat'l Labs., 2007. [85] R. Bro, “Multiway Calibration. Multilinear PLS,” J. Chemometrics, vol. 10, no. 1, pp. 47-61, 1996. [86] J. Sun, D. Tao, and C. Faloutsos, “Beyond Streams and Graphs: Dynamic Tensor Analysis,” Proc. 12th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '06), pp. 374-383, 2006. [87] J. Shawe-Taylor and N. Cristianini, Kernel Methods for Pattern Analysis. Cambridge Univ. Press, 2004. [88] Y. Li, Y. Du, and X. Lin, “Kernel-Based Multifactor Analysis for Image Synthesis and Recognition,” Proc. Int'l Conf. Computer Vision (ICCV '05), vol. 1, pp. 114-119, 2005. [89] C.A. Andersson and R. Bro, “Improving the Speed of Multi-Way Algorithms: Part I. Tucker3,” Chemometrics and Intelligent Laboratory Systems, vol. 42, nos. 1-2, pp. 93-103, 1998. [90] C.A. Andersson and R. Bro, “Improving the Speed of Multi-Way Algorithms: Part II. Compression,” Chemometrics and Intelligent Laboratory Systems, vol. 42, nos. 1-2, pp. 105-113, 1998. [91] M.W. Mahoney, M. Maggioni, and P. Drineas, “Tensor-CUR Decompositions for Tensor-Based Data,” Proc. 12th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '06), pp. 327-336, 2006.