This Article 
 Bibliographic References 
 Add to: 
Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood
July 2000 (vol. 22 no. 7)
pp. 719-725

Abstract—We propose assessing a mixture model in a cluster analysis setting with the integrated completed likelihood. With this purpose, the observed data are assigned to unknown clusters using a maximum a posteriori operator. Then, the Integrated Completed Likelihood (ICL) is approximated using an à la Bayesian information criterion (BIC). Numerical experiments on simulated and real data of the resulting ICL criterion show that it performs well both for choosing a mixture model and a relevant number of clusters. In particular, ICL appears to be more robust than BIC to violation of some of the mixture model assumptions and it can select a number of clusters leading to a sensible partitioning of the data.

[1] M. Aitkin and D.B. Rubin, “Estimation and Hypothesis Testing in Finite Mixture Models,” J. Royal Statistical Soc. B, vol. 47, pp. 67-75, 1985.
[2] J.D. Banfield and A.E. Raftery, “Model-Based Gaussian and Non Gaussian Clustering,” Biometrics, vol. 49, pp. 803-821, 1993.
[3] C. Biernacki, G. Celeux, and G Govaert, “Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood,” Technical Report 3,521, Inria, 1998.
[4] C. Biernacki and G. Govaert, “Using the Classification Likelihood to Choose the Number of Clusters,” Computing Science and Statistics, vol. 29, pp. 451-457, 1997.
[5] C. Biernacki and G. Govaert, “Choosing Models in Model-Based Clustering and Discriminant Analysis,” J. Statistical Computation and Simulation, vol. 14, pp. 49-71, 1999.
[6] P.G. Bryant, “Large Sample Results for Optimization Based Clustering Methods,” J. Classification, vol. 8, pp. 1-44, 1991.
[7] G. Celeux and G. Govaert, “A Classification EM Algorithm for Clustering and Two Stochastic Versions,” Computational Statistics and Data Analysis, vol. 14, pp. 315-332, 1992.
[8] G. Celeux and G. Govaert, “Gaussian Parsimonious Clustering Models,” Pattern Recognition, vol. 28, pp. 781-793, 1995.
[9] G. Celeux and G. Soromenho, “An Entropy Criterion for Assessing the Number of Clusters in a Mixture Model,” J. Classification, vol. 13, pp. 195-212, 1996.
[10] P. Cheeseman and J. Stutz, “Bayesian Classification (AutoClass): Theory and Results,” Advances in Knowledge Discovery and Data Mining, AAAI Press/MIT Press, pp. 61-83, 1996.
[11] D.M. Chickering and D. Heckerman, “Efficient Approximations for the Marginal Likelihood of Bayesian Networks with Hidden Variables,” Machine Learning, vol. 29, pp. 181-212, 1997.
[12] A. Cutler and O. Cordero-Bra, “Minimum Hellinger Distance Estimation for Finite Mixture Models,” J. Am. Statistical Assoc., vol. 91, pp. 1,716-1,723, 1996.
[13] A.P. Dempster, N.M. Laird, and D.B. Rubin, “Maximum Likelihood for Incomplete Data via the EM Algorithm”(with discussion), J. Royal Statistical Soc. B, vol. 39, pp. 1-38, 1977.
[14] J. Diebolt and C.P. Robert, “Estimation of Finite Mixture Distributions through Bayesian Sampling,“ J. Royal Statistical Soc. B, vol. 56, pp. 363-375, 1994.
[15] M.D. Escobar and M. West, “Bayesian Density Estimation and Inference Using Mixtures,” J. Am. Statistical Assoc., vol. 90, pp. 577-588, 1995.
[16] B.S. Everitt, An Introduction to Latent Variables Models. London: Chapman&Hall, 1984.
[17] C. Fraley and A.E. Raftery, “How Many Clusters? Which Clustering Method? Answers via Model-Based Cluster Analysis,” Computer J., vol. 41, pp. 578-588, 1998.
[18] R.E. Kass and A.E. Raftery, “Bayes Factor,” J. Am. Statistical Assoc., vol. 90, pp. 733-795, 1995.
[19] R.E. Kass and L. Wasserman, “A Reference Bayesian Test for Nested Hypotheses and Its Relationship to the Schwarz Criterion,” J. Am. Statistical Assoc., vol. 90, pp. 928-934, 1995.
[20] G.J. McLachlan, The Classification and Mixture Maximum Likelihood Approaches to Cluster Analysis. I, pp. 199-208, Amsterdam: North-Holland, 1982.
[21] G.J. McLachlan and K.E. Basford, Mixture Models: Inference and Applications to Clustering. New York: Marcel Dekker, 1988.
[22] J.J. Oliver, R.A. Baxter, and C.S. Wallace, “Unsupervised Learning Using mml,” Proc. 13th Int'l Conf. Machine Learning, pp. 364-372, 1996.
[23] R.A. Redner and H.F. Walker, “Mixture Densities, Maximum Likelihood and the EM Algorithm,” SIAM Review, vol. 26, pp. 195-239, 1984.
[24] C.P. Robert, “Mixtures of Distributions: Inference and Estimation,” Markov Chain Monte Carlo in Practice, W.R. Gilks, S. Richardson, and D.J. Spiegelhalter, eds., pp. 441-464, London: Chapman&Hall, 1996.
[25] S. Roberts, D. Husmeier, I. Rezek, and W. Penny, Bayesian Approaches to Gaussian Mixture Modeling IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 11, Nov. 1998.
[26] K. Roeder and L. Wasserman, “Practical Bayesian Density Estimation Using Mixtures of Normals,” J. Am. Statistical Assoc., vol. 92, pp. 894-902, 1997.
[27] G. Schwarz, “Estimating the Dimension of a Model,” Annals of Statistics, vol. 6, pp. 461-464, 1978.
[28] P. Smyth, “Model Selection for Probabilistic Clustering Using Cross-Validated Likelihood,” Statistics and Computing, vol. 10, pp. 63-72, 2000.
[29] W.N. Venables and B.D. Ripley, Modern Applied Statistics with S-Plus. New York: Springer-Verlag, 1994.

Index Terms:
Mixture model, clustering, integrated likelihood, BIC, integrated completed likelihood, ICL criterion.
Christophe Biernacki, Gilles Celeux, Gérard Govaert, "Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 7, pp. 719-725, July 2000, doi:10.1109/34.865189
Usage of this product signifies your acceptance of the Terms of Use.