Subscribe
Issue No.07 - July (2010 vol.32)
pp: 1298-1309
Jangsun Baek , Chonnam National University, Gwangju
Geoffrey J. McLachlan , University of Queensland, Brisbane
Lloyd K. Flack , University of Queensland, Brisbane
ABSTRACT
Mixtures of factor analyzers enable model-based density estimation to be undertaken for high-dimensional data, where the number of observations n is not very large relative to their dimension p. In practice, there is often the need to further reduce the number of parameters in the specification of the component-covariance matrices. To this end, we propose the use of common component-factor loadings, which considerably reduces further the number of parameters. Moreover, it allows the data to be displayed in low--dimensional plots.
INDEX TERMS
Normal mixture models, mixtures of factor analyzers, common factor loadings, model-based clustering.
CITATION
Jangsun Baek, Geoffrey J. McLachlan, Lloyd K. Flack, "Mixtures of Factor Analyzers with Common Factor Loadings: Applications to the Clustering and Visualization of High-Dimensional Data", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.32, no. 7, pp. 1298-1309, July 2010, doi:10.1109/TPAMI.2009.149
REFERENCES
 [1] G.J. McLachlan and D. Peel, Finite Mixture Models. Wiley, 2000. [2] A.P. Dempster, N.M. Laird, and D.B. Rubin, "Maximum Likelihood from Incomplete Data via the EM Algorithm (with Discussion)," J. Royal Statistical Soc.: Series B, vol. 39, pp. 1-38, 1977. [3] G.J. McLachlan and T. Krishnan, The EM Algorithm and Extensions, second ed. Wiley, 2008. [4] J.D. Banfield and A.E. Raftery, "Model-Based Gaussian and Non-Gaussian Clustering," Biometrics, vol. 49, pp. 803-821, 1993. [5] G.J. McLachlan, D. Peel, and R.W. Bean, "Modelling High-Dimensional Data by Mixtures of Factor Analyzers," Computational Statistics & Data Analysis, vol. 41, pp. 379-388, 2003. [6] G.J. McLachlan, R.W. Bean, and L. Ben-Tovim Jones, "Extension of the Mixture of Factor Analyzers Model to Incorporate the Multivariate $t$ Distribution," Computational Statistics & Data Analysis, vol. 51, pp. 5327-5338, 2007. [7] G.E. Hinton, P. Dayan, and M. Revow, "Modeling the Manifolds of Images of Handwritten Digits," IEEE Trans. Neural Networks, vol. 8, no. 1, pp. 65-74, Jan. 1997. [8] J. Baek and G.J. McLachlan, "Mixtures of Factor Analyzers with Common Factor Loadings for the Clustering and Visualisation of High-Dimensional Data," Technical Report NI08018-SCH, Preprint Series of the Isaac Newton Inst. for Math. Sciences, 2008. [9] R. Yoshida, T. Higuchi, and S. Imoto, "A Mixed Factors Model for Dimension Reduction and Extraction of a Group Structure in Gene Expression Data," Proc. 2004 IEEE Computational Systems Bioinformatics Conf., pp. 161-172, 2004. [10] R. Yoshida, T. Higuchi, S. Imoto, and S. Miyano, "ArrayCluster: An Analytic Tool for Clustering, Data Visualization and Module Finder on Gene Expression Profiles," Bioinformatics, vol. 22, pp. 1538-1539, 2006. [11] A.-V.I. Rosti and M.J.F. Gales, "Factor Analysis Hidden Markov Models for Speech Recognition," Computer Speech and Language, vol. 18, pp. 181-200, 2004. [12] N. Kumar, "Investigation of Silicon-Auditory Models and Generalization of Linear Discriminant Analysis for Improved Speech Recognition," doctoral dissertation, Johns Hopkins Univ., 1997. [13] R. Gopinath, B. Ramabhadran, and S. Dharanipragada, "Factor Analysis Invariant to Linear Transformations of Data," Proc. Int'l Conf. Speech and Language Processing, pp. 397-400, 1998. [14] M. Gales, "Semi-Tied Covariance Matrices for Hidden Markov Models," IEEE Trans. Speech and Audio Processing, vol. 7, no. 3, pp. 272-281, May 1999. [15] S. Axelrod, R. Gopinath, and P. Olsen, "Modeling with a Subspace Constraint on Inverse Covariance Matrices," Proc. Int'l Conf. Spoken Language Processing, 2002. [16] P. Olsen and R. Gopinath, "Modeling Inverse Covariance Matrices by Basis Expansion," Proc. Int'l Conf. Acoustics, Speech, and Signal Processing, vol. 1, pp. 945-948, 2002. [17] G. Galimberti, A. Montanari, and C. Viroli, "Latent Classes of Objects and Variable Selection," Proc. Int'l Conf. Computational Statistics, P. Brito, ed., pp. 373-383, 2008. [18] G. Sanguinetti, "Dimensionality Reduction of Clustered Data Sets," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 3, pp. 535-540, Mar. 2008. [19] G. Schwarz, "Estimating the Dimension of a Model," Annals of Statistics, vol. 6, pp. 461-464, 1978. [20] L. Hubert and P. Arabie, "Comparing Partitions," J. Classification, vol. 2, pp. 193-218, 1985. [21] E. Yeoh et al., "Classification, Subtype Discovery, and Prediction of Outcome in Pediatric Acute Lymphoblastic Leukemia by Gene Expression Profiling," Cancer Cell, vol. 1, pp. 133-143, 2002. [22] G.J. McLachlan, R.W. Bean, and D. Peel, "Mixture Model-Based Approach to the Clustering of Microarray Expression Data," Bioinformatics, vol. 18, pp. 413-422, 2002. [23] P. Jaccard, "Distribution de la Florine Alpine dans la Bassin de Dranses et dans Quelques Regiones Voisines," Bull. de la Société Vaudoise des Sciences Naturelles, vol. 37, pp. 241-272, 1901. [24] C. Smyth, D. Coomans, and Y. Everingham, "Clustering Noisy Data in a Reduced Dimension Space via Multivariate Regression Trees," Pattern Recognition, vol. 39, pp. 424-431, 2006. [25] M.E. Tipping and C.M. Bishop, "Mixtures of Probabilistic Principal Component Analysers," Neural Computation, vol. 11, pp. 443-482, 1999.