The Community for Technology Leaders
RSS Icon
Issue No.04 - April (2012 vol.34)
pp: 814-817
Ryan P. Browne , University of Guelph, Guelph
Matthew D. Sparling , University of Guelph, Guelph
We introduce a mixture model whereby each mixture component is itself a mixture of a multivariate Gaussian distribution and a multivariate uniform distribution. Although this model could be used for model-based clustering (model-based unsupervised learning) or model-based classification (model-based semi-supervised learning), we focus on the more general model-based classification framework. In this setting, we fit our mixture models to data where some of the observations have known group memberships and the goal is to predict the memberships of observations with unknown labels. We also present a density estimation example. A generalized expectation-maximization algorithm is used to estimate the parameters and thereby give classifications in this mixture of mixtures model. To simplify the model and the associated parameter estimation, we suggest holding some parameters fixed—this leads to the introduction of more parsimonious models. A simulation study is performed to illustrate how the model allows for bursts of probability and locally higher tails. Two further simulation studies illustrate how the model performs on data simulated from multivariate Gaussian distributions and on data from multivariate t-distributions. This novel approach is also applied to real data and the performance of our approach under the various restrictions is discussed.
Statistical computing, multivariate statistics.
Ryan P. Browne, Matthew D. Sparling, "Model-Based Learning Using a Mixture of Mixtures of Gaussian and Uniform Distributions", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.34, no. 4, pp. 814-817, April 2012, doi:10.1109/TPAMI.2011.199
[1] A.C. Aitken, "On Bernoulli's Numerical Solution of Algebraic Equations," Proc. Royal Soc. Edinburgh, vol. 46, pp. 289-305, 1926.
[2] J.L. Andrews and P.D. McNicholas, "Extending Mixtures of Multivariate t-Factor Analyzers," Statistics and Computing, vol. 21, no. 3, pp. 361-373, 2011.
[3] J.L. Andrews, P.D. McNicholas, and S. Subedi, "Model-Based Classification via Mixtures of Multivariate t-Distributions," J. Computational Statistics and Data Analysis, vol. 55, no. 1, pp. 520-529, 2011.
[4] C. Biernacki, G. Celeux, and G. Govaert, "Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 7, pp. 719-725, July 2000.
[5] D. Böhning, E. Dietz, R. Schaub, P. Schlattmann, and B. Lindsay, "The Distribution of the Likelihood Ratio for Mixtures of Densities from the One-Parameter Exponential Family," Annals Inst. of Statistical Math., vol. 46, pp. 373-388, 1994.
[6] G. Bouchard and G. Celeux, "Selection of Generative Models in Classification," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 4, pp. 544-554, Apr. 2006.
[7] C. Bouveyron, S. Girard, and C. Schmid, "High-Dimensional Data Clustering," Computational Statistics and Data Analysis, vol. 52, no. 1, pp. 502-519, 2007.
[8] A.P. Dempster, N.M. Laird, and D.B. Rubin, "Maximum Likelihood from Incomplete Data via the EM Algorithm," J. Royal Statistical Soc. B, vol. 39, pp. 1-38, 1977.
[9] M. Forina, C. Armanino, M. Castino, and M. Ubigli, "Multivariate Data Analysis as a Discriminating Method of the Origin of Wines," Vitis, vol. 25, pp. 189-201, 1986.
[10] C. Fraley and A.E. Raftery, "Model-Based Clustering, Discriminant Analysis, and Density Estimation," J. Am. Statistical Assoc., vol. 97, pp. 611-631, 2002.
[11] T. Hastie and R. Tibshirani, "Discriminant Analysis by Gaussian Mixtures," J. Royal Statistical Soc. B, vol. 58, pp. 155-176, 1996.
[12] L. Hubert and P. Arabie, "Comparing Partitions," J. Classification, vol. 2, pp. 193-218, 1985.
[13] C. Hurley, "Clustering Visualizations of Multivariate Data," J. Computational and Graphical Statistics, vol. 13, no. 4, pp. 788-806, 2004.
[14] B.G. Lindsay, "Mixture Models: Theory, Geometry and Applications," Proc. NSF-CBMS Regional Conf. Series in Probability and Statistics, vol. 5, 1995.
[15] G.J. McLachlan, R.W. Bean, and L. Ben-Tovim Jones, "Extension of the Mixture of Factor Analyzers Model to Incorporate the Multivariate t-Distribution," Computational Statistics and Data Analysis, vol. 51, no. 11, pp. 5327-5338, 2007.
[16] G.J. McLachlan and T. Krishnan, The EM Algorithm and Extensions, second ed. John Wiley and Sons, 2008.
[17] P.D. McNicholas, "Model-Based Classification Using Latent Gaussian Mixture Models," J. Statistical Planning and Inference, vol. 140, no. 5, pp. 1175-1181, 2010.
[18] P.D. McNicholas and T.B. Murphy, "Model-Based Clustering of Microarray Expression Data via Latent Gaussian Mixture Models," Bioinformatics, vol. 26, no. 21, pp. 2705-2712, 2010.
[19] P.D. McNicholas, T.B. Murphy, A.F. McDaid, and D. Frost, "Serial and Parallel Implementations of Model-Based Clustering via Parsimonious Gaussian Mixture Models," Computational Statistics and Data Analysis, vol. 54, no. 3, pp. 711-723, 2010.
[20] M. Meila, "Comparing Clusterings—An Information Based Distance," J. Multivariate Analysis, vol. 98, no. 5, pp. 873-895, 2007.
[21] K. Nakai and M. Kanehisa, "Expert System for Predicting Protein Localization Sites in Gram-Negative Bacteria," Proteins, vol. 11, no. 2, pp. 95-110, 1991.
[22] P. Orbanz and J.M. Buhmann, "SAR Images as Mixtures of Gaussian Mixtures," IEEE Int'l Conf. Image Processing, vol. 2, pp. 209-212, 2005.
[23] R Development Core Team, R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, 2010.
[24] W.M. Rand, "Objective Criteria for the Evaluation of Clustering Methods," J. Am. Statistical Assoc., vol. 66, pp. 846-850, 1971.
[25] G. Schwarz, "Estimating the Dimension of a Model," The Annals of Statistics, vol. 6, pp. 461-464, 1978.
[26] D. Steinley, "Properties of the Hubert-Arabie Adjusted Rand Index," Psychological Methods, vol. 9, no. 3, pp. 386-396, 2004.
[27] M. Di Zio, U. Guarnera, and R. Rocci, "A Mixture of Mixture Models for a Classification Problem: The Unity Measure Error," Computational Statistics and Data Analysis, vol. 51, no. 5, pp. 2573-2585, 2007.
18 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool