• Publication
  • 2007
  • Issue No. 10 - October
  • Abstract - High-Dimensional Unsupervised Selection and Estimation of a Finite Generalized Dirichlet Mixture Model Based on Minimum Message Length
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
High-Dimensional Unsupervised Selection and Estimation of a Finite Generalized Dirichlet Mixture Model Based on Minimum Message Length
October 2007 (vol. 29 no. 10)
pp. 1716-1731
We consider the problem of determining the structure of high-dimensional data, without prior knowledge of the number of clusters. Data are represented by a finite mixture model based on the generalized Dirichlet distribution. The generalized Dirichlet distribution has a more general covariance structure than the Dirichlet distribution and offers high flexibility and ease of use for the approximation of both symmetric and asymmetric distributions. This makes the generalized Dirichlet distribution more practical and useful. An important problem in mixture modeling is the determination of the number of clusters. Indeed, a mixture with too many or too few components may not be appropriate to approximate the true model. Here, we consider the application of the minimum message length (MML) principle to determine the number of clusters. The MML is derived so as to choose the number of clusters in the mixture model which best describes the data. A comparison with other selection criteria is performed. The validation involves synthetic data, real data clustering, and two interesting real applications: classification of web pages, and texture database summarization for efficient retrieval.

[1] N. Bouguila and D. Ziou, “MML-Based Approach for High-Dimensional Learning Using the Generalized Dirichlet Mixture,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition—Workshops, p. 53, 2005.
[2] G.J. McLachlan and D. Peel, Finite Mixture Models. John Wiley & Sons, 2000.
[3] B.S. Everitt and D.J. Hand, Finite Mixture Distributions. Chapman and Hall, 1981.
[4] A.K. Jain, R.P.W. Duin, and J. Mao, “Statistical Pattern Recognition: A Review,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 1, pp. 4-37, Jan. 2000.
[5] H. Bensmail, G. Celeux, A. Raftery, and C. Robert, “Inference in Model-Based Cluster Analysis,” Statistics and Computing, vol. 7, pp. 1-10, 1997.
[6] K. Roeder and L. Wasserman, “Practical Bayesian Density Estimation Using Mixture of Normals,” J. Am. Statistical Assoc., vol. 92, pp. 894-902, 1997.
[7] S. Richardson and P. Green, “On Bayesian Analysis of Mixtures with Unknown Number of Components,” J. Royal Statistic Soc. B, vol. 59, pp. 731-792, 1997.
[8] G. McLachlan, “On Bootstrapping the Likelihood Ratio Test Statistic for the Number of Components in a Normal Mixture,” J.Royal Statistic Soc. C, vol. 36, pp. 318-324, 1987.
[9] P. Smyth, “Model Selection for Probabilistic Clustering Using Cross-Validated Likelihood,” Statistics and Computing, vol. 10, no. 1, pp. 63-72, 2000.
[10] G. Schwarz, “Estimating the Dimension of a Model,” The Annals of Statistics, vol. 6, no. 2, pp. 461-464, 1978.
[11] C.S. Wallace and D.M. Boulton, “An Information Measure for Classification,” The Computer J., vol. 11, no. 2, pp. 195-209, 1968.
[12] M.A.T. Figueiredo and A.K. Jain, “Unsupervised Learning of Finite Mixture Models,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 3, pp. 4-37, Mar. 2002.
[13] H. Akaike, “A New Look at the Statistical Model Identification,” IEEE Trans. Automatic Control, vol. 9, no. 6, pp. 716-723, 1974.
[14] H. Bozdogan, “Determining the Number of Component Clusters in the Standard Multivariate Normal Mixture Model Using Model Selection Criteria,” Technical Report A83-1, Quantitative Methods Dept., Univ. of Illi nois, 1983.
[15] J. Rissanen, “Modeling by Shortest Data Description,” Automatica, vol. 14, pp. 465-471, 1978.
[16] J. Rissanen, “Universal Coding, Information, Prediction and Estimation,” IEEE Trans. Information Theory, vol. 30, no. 4, pp.629-636, 1984.
[17] A.R. Barron, J. Rissanen, and B. Yu, “The Minimum Description Length Principle in Coding and Modeling,” IEEE Trans. Information Theory, vol. 44, no. 6, pp. 2743-2760, 1998.
[18] M.A.T. Figueiredo, J.M.N. Leitao, and A.K. Jain, “On Fitting Mixture Models,” Energy Minimization Methods in Computer Vision and Pattern Recognition, E. Hancock and M. Pellilo, eds. Springer, pp. 54-69, 1999.
[19] R.A. Baxter and J.J. Oliver, “Finding Overlapping Components with MML,” Statistics and Computing, vol. 10, no. 1, pp. 5-16, 2000.
[20] S.J. Roberts, D. Husmeier, I. Rezek, and W. Penny, “Bayesian Approaches to Gaussian Mixture Modeling,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 11, pp. 1133-1142, 1998.
[21] C.S. Wallace, Statistical and Inductive Inference by Minimum Message Length. Springer, 2005.
[22] J.W. Comley and D.L. Dowe, “Minimum Message Length, MDL and Generalised Bayesian Networks with Asymmetric Languages,” Advances in Minimum Description Length: Theory and Applications, pp. 265-294, 2005.
[23] C.S. Wallace, “An Improved Program for Classification,” Proc. Ninth Australian Computer Science Conf., pp. 357-366, 1986.
[24] C.S. Wallace, “Classification by Minimum-Message-Length Inference,” Proc. Advances in Computing and Information, S.G. Akl etal., eds., pp. 72-81, 1990.
[25] C.S. Wallace and D.L. Dowe, “Intrinsic Classification by MML—The Snob Program,” Proc. Seventh Australian Joint Conf. Artificial Intelligence, pp. 37-44, 1994.
[26] C.S. Wallace and D.L. Dowe, “MML Mixture Modelling of Multi-State, Poisson, von Mises Circular and Gaussian Distributions,” Proc. Sixth Int'l Workshop Artificial Intelligence and Statistics, pp.529-536, 1997.
[27] C.S. Wallace and D.L. Dowe, “MML Mixture Modelling of Multi-State, Poisson, von Mises Circular and Gaussian Distributions,” Proc. 28th Symp. Interface, Computing Science and Statistics, pp.608-613, 1997.
[28] C.S. Wallace and D.L. Dowe, “MML Clustering of Multi-State, Poisson, von Mises Circular and Gaussian Distributions,” Statistics and Computing, vol. 10, no. 1, pp. 73-83, 2000.
[29] C.S. Wallace, “Intrinsic Classification of Spatially Correlated Data,” The Computer J., vol. 41, no. 8, pp. 602-611, 1998.
[30] D. Ziou and N. Bouguila, “Unsupervised Learning of a Gamma Finite Mixture Using MML: Application to SAR Image Analysis,” Proc. 17th Int'l Conf. Pattern Recognition, pp.280-283, 2004.
[31] Y. Agusta and D.L. Dowe, “Unsupervised Learning of Gamma Mixture Models Using Minimum Message Length,” Proc. Third IASTED Conf. Artificial Intelligence and Applications, M.H. Hamza, ed., pp. 457-462, 2003.
[32] Y. Agusta and D.L. Dowe, “MML Clustering of Continuous-Valued Data Using Gaussian and t Distributions,” Proc. 15th Australian Joint Conf. Artificial Intelligence, B. McKay and J. Slaney, eds., pp. 143-154, 2002.
[33] T. Wong, “Generalized Dirichlet Distribution in Bayesian Analysis,” Applied Math. and Computation, vol. 97, pp. 165-181, 1998.
[34] N. Bouguila, D. Ziou, and J. Vaillancourt, “Unsupervised Learning of a Finite Mixture Model Based on the Dirichlet Distribution and Its Application,” IEEE Trans. Image Processing, vol. 13, no. 11, pp. 1533-1543, 2004.
[35] R.J. Beckman and G.L. Tietjen, “Maximum Likelihood Estimation for the Beta Distribution,” J. Statistics and Computational Simulation, vol. 7, pp. 253-258, 1978.
[36] K. Sjolander, K. Karplus, M. Brown, R. Hughey, A. Krogh, I.S. Mian, and D. Haussler, “Dirichlet Mixtures: A Method for Improving Detection of Weak but Significant Protein Sequence Homology,” Computer Applications in the Biosciences, vol. 12, no. 4, pp. 327-345, 1996.
[37] D. Blei, A. Ng, and M. Jordan, “Latent Dirichlet Allocation,” J.Machine Learning Research, vol. 3, pp. 993-1022, 2003.
[38] R.J. Connor and J.E. Mosimann, “Concepts of Independence for Proportions with a Generalization of the Dirichlet Distribution,” J.Am. Statistical Assoc., vol. 64, pp. 194-206, 1969.
[39] C.S. Wallace and D.L. Dowe, “Minimum Message Length and Kolmogorov Complexity,” The Computer J., vol. 42, no. 4, pp. 270-283, 1999.
[40] J.J. Oliver and R.A. Baxter, “MML and Bayesianism: Similarities and Differences,” Technical Report 205, Dept. Computer Science, Monash Univ., July 1994.
[41] J. Conway and N. Sloane, Sphere Packings, Lattice, and Groups. Springer, 1993.
[42] C.S. Wallace and P.R. Freeman, “Estimation and Inference by Compact Coding,” J. Royal Statistical Soc. B, vol. 49, pp. 240-252, 1987.
[43] G.J. McLachlan and T. Krishnan, The EM Algorithm and Extensions. Wiley-Interscience, 1997.
[44] A.P. Dempster, N.M. Laird, and D.B. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm,” J. Royal Statistical Soc., B, vol. 39, no. 1, pp. 1-38, 1977.
[45] N. Bouguila and D. Ziou, “A Powerful Finite Mixture Model Based on the Generalized Dirichlet Distribution: Unsupervised Learning and Applications,” Proc. 17th Int'l Conf. Pattern Recognition, pp. 68-71, 2004.
[46] D.W. Scott and J.R. Thompson, “Probability Density Estimation in Higher Dimensions,” Computer Science and Statistics, pp. 173-179, 1983.
[47] C.R. Rao, Advanced Statistical Methods in Biomedical Research. John Wiley & Sons, 1952.
[48] G. Celeux and J. Diebolt, “The SEM Algorithm: A Probabilistic Teacher Algorithm Derived from the EM Algorithm for the Mixture Problem,” Computational Statistics Quarterly, vol. 2, no. 1, pp. 73-82, 1985.
[49] G. Celeux and J. Diebolt, “A Stochastic Approximation Type EM Algorithm for the Mixture Problem,” Stochastics and Stochastics Reports, vol. 41, pp. 119-134, 1992.
[50] R.T. Edwards and D.L. Dowe, “Single Factor Analysis in MML Mixture Modelling,” Proc. Second Pacific-Asia Conf. Knowledge Discovery and Data Mining, pp. 96-109, 1998.
[51] C.L. Blake and C.J. Merz, Repository of Machine Learning Databases. Dept. Information and Computer Sciences, Univ. of California, Irvine, 1998, http://www.ics.uci.edu/~mlearnMLRepository.html .
[52] L. Kaufman and P.J. Rousseeuw, Finding Groups in Data. John Wiley & Sons, 1990.
[53] C. Fraley and A.E. Raftery, “How Many Clusters? Which Clustering Method? Answers via Model-Based Cluster Analysis,” The Computer J., vol. 41, no. 8, 1998.
[54] G.M. Reaven and R.G. Miller, “An Attempt to Define the Nature of Chemical Diabetes Using a Multidimensional Analysis,” Diabetologia, vol. 16, pp. 17-24, 1979.
[55] R. Kothari and D. Pitts, “On Finding the Number of Clusters,” Pattern Recognition Letters, vol. 20, pp. 405-416, 1999.
[56] E. Anderson, “The Irises of the Gaspe Peninsula,” Bull. Am. Iris Soc., vol. 59, pp. 2-5, 1935.
[57] Y. Agusta and D.L. Dowe, “Unsupervised Learning of Correlated Multivariate Gaussian Mixture Models Using MML,” Proc. 16th Australian Joint Conf. Artificial Intelligence, T.D. Gedeon and L.C.Fung, eds., pp. 477-489, 2003.
[58] G. Salton, Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, 1989.
[59] A.K. McCallum, “Bow: A Toolkit for Statistical Language Modeling, Text Retrieval, Classification and Clustering,” http://www.cs.cmu.edu/mccallumbow, 1996.
[60] W. Niblack, R. Barber, W. Equitz, M. Flickner, E.H. Glasman, D. Yanker, P. Faloutsos, and G. Taubin, “The QBIC Project: Querying Images by Content Using Color, Texture and Shape,” Technical Report RJ 9203, IBM, 1993.
[61] A. Pentland, R. Picard, and S. Sclaroff, “Photobook: Content-Based Manipulation of Image Databases,” Int'l J. Computer Vision, vol. 18, no. 3, pp. 233-254, 1996.
[62] C. Carson, S. Belongie, H. Greenspan, and J. Malik, “Blobworld: Image Segmentation Using Expectation-Maximization and Its Application to Image Querying,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 8, pp. 1026-1038, Aug. 2002.
[63] J.R. Smith and S.F. Chang, “VisualSEEK: A Fully Automated Content-Based Image Query System,” Proc. Fourth ACM Int'l Conf. Multimedia, pp. 87-98, 1996.
[64] M.L. Kherfi, D. Ziou, and A. Bernardi, “Combining Positive and Negative Examples in Relevance Feedback for Content-Based Image Retrieval,” J. Visual Comm. and Image Representation, vol. 14, pp. 428-457, 2003.
[65] A. Vailaya, M.A.T. Figueiredo, A.K. Jain, and H. Zhang, “Image Classification for Content-Based Indexing,” IEEE Trans. Image Processing, vol. 10, no. 1, pp. 117-130, 2001.
[66] S. Newsman, B. Sumengen, and B.S. Manjunath, “Category-Based Image Retrieval,” Proc. Seventh IEEE Int'l Conf. Image Processing, Special Session on Multimedia Indexing, Browsing, and Retrieval, 2001.
[67] J. Huang, S.R. Kumar, M. Mitra, W.-J. Zhu, and R. Zabih, “Image Indexing Using Color Correlograms,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, p. 762, 1997.
[68] T. Randen and J.H. Husoy, “Filtering for Texture Classification: A Comparative Study,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 21, no. 4, pp. 291-310, Apr. 1999.
[69] M. Unser, “Sum and Difference Histograms for Texture Classification,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 8, no. 1, pp. 118-125, 1986.

Index Terms:
Finite mixture models, generalized Dirichlet mixture, EM, information theory, MML, AIC, MDL, LEC, data clustering, image database summarization, webmining
Citation:
Nizar Bouguila, Djemel Ziou, "High-Dimensional Unsupervised Selection and Estimation of a Finite Generalized Dirichlet Mixture Model Based on Minimum Message Length," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 10, pp. 1716-1731, Oct. 2007, doi:10.1109/TPAMI.2007.1095
Usage of this product signifies your acceptance of the Terms of Use.