
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
M.A.T. Figueiredo, A.K. Jain, "Unsupervised Learning of Finite Mixture Models," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 3, pp. 381396, March, 2002.  
BibTex  x  
@article{ 10.1109/34.990138, author = {M.A.T. Figueiredo and A.K. Jain}, title = {Unsupervised Learning of Finite Mixture Models}, journal ={IEEE Transactions on Pattern Analysis and Machine Intelligence}, volume = {24}, number = {3}, issn = {01628828}, year = {2002}, pages = {381396}, doi = {http://doi.ieeecomputersociety.org/10.1109/34.990138}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Pattern Analysis and Machine Intelligence TI  Unsupervised Learning of Finite Mixture Models IS  3 SN  01628828 SP381 EP396 EPD  381396 A1  M.A.T. Figueiredo, A1  A.K. Jain, PY  2002 KW  finite mixtures KW  unsupervised learning KW  model selection KW  minimum message length criterion KW  Bayesian methods KW  expectationmaximization algorithm KW  clustering VL  24 JA  IEEE Transactions on Pattern Analysis and Machine Intelligence ER   
This paper proposes an unsupervised algorithm for learning a finite mixture model from multivariate data. The adjective “unsupervised” is justified by two properties of the algorithm: 1) it is capable of selecting the number of components and 2) unlike the standard
[1] J. Banfield and A. Raftery, “ModelBased Gaussian and NonGaussian Clustering,” Biometrics, vol. 49, pp. 803821, 1993.
[2] H. Bensmail, G. Celeux, A. Raftery, and C. Robert, “Inference in ModelBased Cluster Analysis,” Statistics and Computing, vol. 7, pp. 110, 1997.
[3] J. Bernardo and A. Smith, Bayesian Theory. Chichester, UK: J. Wiley&Sons, 1994.
[4] D. Bertsekas, Nonlinear Programming. Belmont, Mass.: Athena Scientific, 1999.
[5] C. Biernacki, G. Celeux, and G. Govaert, Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, pp. 719725, 2000.
[6] C. Biernacki, G. Celeux, and G. Govaert, “An Improvement of the NEC Criterion for Assessing the Number of Clusters in a Mixture Model,” Pattern Recognition Letters, vol. 20, pp. 267272, 1999.
[7] C. Biernacki and G. Govaert, “Using the Classification Likelihood to Choose the Number of Clusters,” Computing Science and Statistics, vol. 29, pp. 451457, 1997.
[8] H. Bozdogan, “Choosing the Number of Component Clusters in the Mixture Model Using a New Informational Complexity Criterion of the InverseFisher Information Matrix,” Information and Classification, O. Opitz, B. Lausen, and R. Klar, eds., pp. 4054, Springer Verlag, 1993.
[9] M. Brand, “Structure Discovery in Conditional Probability Models via an Entropic Prior and Parameter Extinction,” Neural Computation, vol. 11, no. 5, pp. 1,1551.182, 1999.
[10] J. Campbell, C. Fraley, F. Murtagh, and A. Raftery, “Linear Flaw Detection in Woven Textiles Using ModelBased Clustering,” Pattern Recognition Letters, vol. 18, pp. 15391548, 1997.
[11] G. Celeux, S. Chrétien, F. Forbes, and A. Mkhadri, “A ComponentWise EM Algorithm for Mixtures,” Technical Report 3746, INRIA RhôneAlpes, France, 1999. Available athttp://www.inria.fr/RRRTRR3746.html.
[12] G. Celeux and G. Soromenho, “An Entropy Criterion for Assessing the Number of Clusters in a Mixture Model,” Classification J., vol. 13, pp. 195212, 1996.
[13] S. Chrétien and A. HeroIII, “Kullback Proximal Algorithms for Maximum Likelihood Estimation,” IEEE Trans. Information Theory, vol. 46, pp. 18001810, 2000.
[14] J. Conway and N. Sloane, Sphere Packings, Lattices, and Groups. New York: Springer Verlag, 1993.
[15] T.M. Cover and J.A. Thomas, Elements of Information Theory. John Wiley&Sons, 1991.
[16] S. Dalal and W. Hall, “Approximating Priors by Mixtures of Natural Conjugate Priors,” J. Royal Statistical Soc. (B), vol. 45, 1983.
[17] A. Dasgupta and A. Raftery, “Detecting Features in Spatial Point Patterns with Clutter Via ModelBased Clustering,” J. Am. Statistical Assoc., vol. 93, pp. 294302, 1998.
[18] A. Dempster, N. Laird, and D. Rubin, “Maximum Likelihood Estimation from Incomplete Data Via the EM Algorithm,” J. Royal Statistical Soc. B, vol. 39, pp. 138, 1977.
[19] R. Duda and P. Hart, Pattern Classification and Scene Analysis. New York: John Wiley&Sons, 1973.
[20] M. Figueiredo and A.K. Jain, “Unsupervised Selection and Estimation of Finite Mixture Models,” Proc. Int'l Conf. Pattern Recognition—ICPR2000, pp. 8790, 2000.
[21] M. Figueiredo, J. Leitão, and A.K. Jain, On Fitting Mixture Models Energy Minimization Methods in Computer Vision and Pattern Recognition, E. Hancock and M. Pellilo, eds., pp. 5469, SpringerVerlag, 1999.
[22] C. Fraley and A. Raftery, “How Many Clusters? Which Clustering Method? Answers Via ModelBased Cluster Analysis,” Technical Report 329, Dept. Statistics, Univ. Washington, Seattle, WA, 1998.
[23] Z. Ghahramani and M. Beal, “Variational Inference for Bayesian Mixtures of Factor Analyzers,” Advances in Neural Information Processing Systems 12, S. Solla, T. Leen, and K.R. Müller, eds., pp. 449455, MIT Press, 2000.
[24] Z. Ghahramani and G. Hinton, “The EM Algorithm for Mixtures of Factor Analyzers,” Technical Report CRGTR961, Univ. of Toronto, Canada, 1997.
[25] T. Hastie and R. Tibshirani, “Discriminant Analysis by Gaussian Mixtures,” J. Royal Statistical Soc. (B), vol. 58, pp. 155176, 1996.
[26] G.E. Hinton, P. Dayan, and M. Revow, Modeling the Manifolds of Images of Handwritten Digits IEEE Trans. Neural Networks, vol. 8, no. 1, pp. 6574, Jan. 1997.
[27] T. Hofmann and M. Buhmann, Pairwise Data Clustering by Deterministic Annealing IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, no. 1, pp. 114, Jan. 1997.
[28] A.K. Jain and R.C. Dubes, Algorithms for Clustering Data. Englewood Cliffs, N.J.: Prentice Hall, 1988.
[29] A.K. Jain, R.P.W. Duin, and J. Mao, Statistical Pattern Recognition: A Review IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 1, pp. 437, Jan. 2000.
[30] A.K. Jain and F. Farrokhnia, “Unsupervised Texture Segmentation Using Gabor Filters,” Pattern Recognition, vol. 24, no. 12, pp. 11671186, 1991.
[31] M. Kloppenburg and P. Tavan, “Deterministic Annealing for Density Estimation by Multivariate Normal Mixtures,” Physical Rev. E, vol. 55, pp. R2089R2092, 1997.
[32] A. Lanterman, “Schwarz, Wallace, and Rissanen: Intertwining Themes in Theories of Model Order Estimation,” Int'l Statistical Rev., vol. 69, pp. 185212, Aug. 2001.
[33] G. McLachlan, “On Bootstrapping the Likelihood Ratio Test Statistic for the Number of Components in a Normal Mixture,” J. Royal Statistical Soc. Series (C), vol. 36, pp. 318324, 1987.
[34] G. McLachlan, Discriminant Analyis and Statistical Pattern Recognition. New York: John Wiley&Sons, 1992.
[35] G. McLachlan and K. Basford, Mixture Models: Inference and Application to Clustering. New York: Marcel Dekker, 1988.
[36] G. McLachlan and T. Krishnan, The EM Algorithm and Extensions. New York: John Wiley&Sons, 1997.
[37] G. McLachlan and D. Peel, Finite Mixture Models. New York: John Wiley&Sons, 2000.
[38] P. Meinicke and H. Ritter, “ResolutionBased Complexity Control for Gaussian Mixture Models,” Neural Computation, vol. 13, no. 2, pp. 453475, 2001.
[39] K. Mengersen and C. Robert, “Testing for Mixtures: A Bayesian Entropic Approach,” Proc. Fifth Valencia Int'l Meeting Bayesian Statistsics 5, J. Bernardo, J. Berger, A. Dawid, and F. Smith, eds., pp. 255276, 1996.
[40] R. Neal, “Bayesian Mixture Modeling,” Proc. 11th Int'l Workshop Maximum Entropy and Bayesian Methods of Statistical Analysis, pp. 197211, 1992.
[41] R. Neal and G. Hinton, “A View of the EM Algorithm that Justifies Incremental, Sparse, and Other Variants,” Learning in Graphical Models, M.I. Jordan, ed., pp. 355368, Kluwer Academic Publishers, 1998.
[42] J. Oliver, R. Baxter, and C. Wallace, “Unsupervised Learning Using MML,” Proc. 13th Int'l Conf. Machine Learning, pp. 364372, 1996.
[43] P. Pudil, J. Novovicova, and J. Kittler, “Feature Selection Based on the Approximation of Class Densities by Finite Mixtures of the Special Type,” Pattern Recognition, vol. 28, no. 9, pp. 13891398, 1995.
[44] A. Rangarajan, “Self Annealing: Unifying Deterministic Annealing and Relaxation Labeling,” Energy Minimization Methods in Computer Vision and Pattern Recognition, M. Pellilo and E. Hancock, eds., pp. 229244, Springer Verlag, 1997.
[45] C. Rasmussen, “The Infinite Gaussian Mixture Model,” Advances in Neural Information Processing Systems 12, S. Solla, T. Leen, and K.R. Müller, eds., pp. 554560, MIT Press, 2000.
[46] S.J. Raudys and A.K. Jain, "Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 13, pp. 252264, 1991.
[47] S. Raudys and V. Pikelis, “On Dimensionality, Sample Size, Classification Error, and Complexity of Classification Algorithms in Pattern Recognition,” IEEE Trans. Pattern Analysis and Machice Intelligence, vol. 2, pp. 243252, 1980.
[48] S. Richardson and P. Green, “On Bayesian Analysis of Mixtures with Unknown Number of Components,” J. Royal Statistical Soc. B, vol. 59, pp. 731792, 1997.
[49] J. Rissanen, Stochastic Complexity in Statistical Inquiry. World Scientific Series in Computer Science, vol. 15, 1989.
[50] S. Roberts, D. Husmeier, I. Rezek, and W. Penny, Bayesian Approaches to Gaussian Mixture Modeling IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 11, Nov. 1998.
[51] K. Roeder and L. Wasserman, “Practical Bayesian Density Estimation Using Mixtures of Normals,” J. Am. Statistical Assoc., vol. 92, pp. 894902, 1997.
[52] K. Rose, “Deterministic Annealing for Clustering, Compression, Classification, Regression and Related Optimization Problems,” Proc. IEEE, vol. 86, pp. 2,2102,239, 1998.
[53] G. Schwarz, “Estimating the Dimension of a Model, Annals of Statistics, vol. 6, pp. 461464, 1978.
[54] P. Smyth, “Model Selection for Probabilistic Clustering Using CrossValidated Likelihood,” Statistics and Computing, vol. 10, pp. 6372, 2000.
[55] R. Streit and T. Luginbuhl, “Maximum Likelihood Training of Probabilistic Neural Networks,” IEEE Trans. Neural Networks, vol. 5, no. 5, pp. 764783, 1994.
[56] M.E. Tipping and C.M. Bishop, “Mixtures of Probabilistic Principal Component Analysers,” Neural Computation, vol. 11, no. 2, pp. 443482, 1999.
[57] D. Titterington, A. Smith, and U. Makov, Statistical Analysis of Finite Mixture Distributions. Chichester, U.K.: John Wiley&Sons, 1985.
[58] N. Ueda and R. Nakano, Deterministic Annealing EM Algorithm Neural Networks, vol. 11, pp. 271282, 1998.
[59] N. Ueda, R. Nakano, Z. Gharhamani, and G. Hinton, “SMEM Algorithm for Mixture Models,” Neural Computation, vol. 12, pp. 21092128, 2000.
[60] C. Wallace and D. Dowe, “Minimum Message Length and Kolmogorov Complexity,” The Computer J., vol. 42, no. 4, pp. 270283, 1999.
[61] C. Wallace and P. Freeman, “Estimation and Inference Via Compact Coding,” J. Royal Statistical Soc. (B), vol. 49, no. 3, pp. 241252, 1987.
[62] M. Whindham and A. Cutler, “Information Ratios for Validating Mixture Analysis,” J. Am. Statistcal Assoc., vol. 87, pp. 11881192, 1992.
[63] L. Xu and M.I. Jordan, “On Convergence Properties of the EM Algorithm for Gaussian Mixtures,” Neural Computation, vol. 8, pp. 129151, 1996.
[64] A. Zellner, “Maximal Data Information Prior Distributions,” New Developments in the Applications of Bayesian Methods, A. Aykac and C. Brumat, eds., pp. 211232, Amsterdam: North Holland, 1977.