The Community for Technology Leaders
RSS Icon
Issue No.07 - July (2008 vol.30)
pp: 1146-1157
In spite of the initialization problem, the Expectation-Maximization (EM) algorithm is widely used for estimating the parameters of finite mixture models. Most popular model-based clustering techniques might yield poor clusters if the parameters are not initialized properly. To reduce the sensitivity of initial points, a novel algorithm for learning mixture models from multivariate data is introduced in this paper. The proposed algorithm takes advantage of TRUST-TECH (TRansformation Under STability-reTaining Equilibra CHaracterization) to compute neighborhood local maxima on likelihood surface using stability regions. Basically, our method coalesces the advantages of the traditional EM with that of the dynamic and geometric characteristics of the stability regions of the corresponding nonlinear dynamical system of the log-likelihood function. Two phases namely, the EM phase and the stability region phase, are repeated alternatively in the parameter space to achieve improvements in the maximum likelihood. The EM phase obtains the local maximum of the likelihood function and the stability region phase helps to escape out of the local maximum by moving towards the neighboring stability regions. The algorithm has been tested on both synthetic and real datasets and the improvements in the performance compared to other approaches are demonstrated. The robustness with respect to initialization is also illustrated experimentally.
expectation maximization, unsupervised learning, finite mixture models, dynamical systems, stability regions, model-based clustering.
Chandan K. Reddy, Hsiao-Dong Chiang, Bala Rajaratnam, "TRUST-TECH-Based Expectation Maximization for Learning Finite Mixture Models", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.30, no. 7, pp. 1146-1157, July 2008, doi:10.1109/TPAMI.2007.70775
[1] J.D. Banfield and A.E. Raftery, “Model-Based Gaussian and Non-Gaussian Clustering,” Biometrics, vol. 49, no. 3, pp. 803-821, 1993.
[2] L. Baum, T. Petrie, G. Soules, and N. Weiss, “A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains,” Annals of Math. Statistics, vol. 41, no. 1, pp. 164-171, 1970.
[3] J.A. Bilmes, “A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models,” technical report, Univ. of California, Berkeley, Apr. 1998.
[4] C.L. Blake and C.J. Merz, UCI Repository of Machine Learning Databases, Dept. of Information and Computer Sciences, Univ. of California, Irvine, .html , 1998.
[5] C. Carson, S. Belongie, H. Greenspan, and J. Malik, “Blobworld: Image Segmentation Using Expectation-Maximization and Its Application to Image Querying,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 8, pp. 1026-1038, Aug. 2002.
[6] H.D. Chiang and C.C. Chu, “A Systematic Search Method for Obtaining Multiple Local Optimal Solutions of Nonlinear Programming Problems,” IEEE Trans. Circuits and Systems: I Fundamental Theory and Applications, vol. 43, no. 2, pp. 99-109, 1996.
[7] A.P. Demspter, N.A. Laird, and D.B. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm,” J. Royal Statistical Soc. Series B, vol. 39, no. 1, pp. 1-38, 1977.
[8] G. Elidan, M. Ninio, N. Friedman, and D. Schuurmans, “Data Perturbation for Escaping Local Maxima in Learning,” Proc. 18th Nat'l Conf. Artificial Intelligence, pp. 132-139, 2002.
[9] M. Figueiredo and A.K. Jain, “Unsupervised Learning of Finite Mixture Models,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 3, pp. 381-396, Mar. 2002.
[10] Z. Ghahramani and G.E. Hinton, “The EM Algorithm for Mixtures of Factor Analyzers,” Technical Report CRG-TR-96-1, Univ. of Toronto, 1996.
[11] P. Green, “Reversible Jump Markov Chain Monte Carlo Computation and Bayesian Model Determination,” Biometrika, vol. 82, no. 4, pp. 711-732, 1995.
[12] T. Hastie and R. Tibshirani, “Discriminant Analysis by Gaussian Mixtures,” J. Royal Statistical Soc. Series B, vol. 58, pp. 158-176, 1996.
[13] D. Heckerman, “A Tutorial on Learning with Bayesian Networks,” Microsoft Research Technical Report MSR-TR-95-06, 1995.
[14] J. Lee and H.D. Chiang, “A Dynamical Trajectory-Based Methodology for Systematically Computing Multiple Optimal Solutions of General Nonlinear Programming Problems,” IEEE Trans. Automatic Control, vol. 49, no. 6, pp. 888-899, 2004.
[15] J.Q. Li, “Estimation of Mixture Models,” PhD dissertation, Dept. of Statistics, Yale Univ., 1999.
[16] Z. Lu and T. Leen, “Semi-Supervised Learning with Penalized Probabilistic Clustering,” Proc. Neural Information Processing Systems, 2005.
[17] A.M. Martínez and J. Vitri, “Learning Mixture Models Using a Genetic Version of the EM Algorithm,” Pattern Recognition Letters, vol. 21, no. 8, pp. 759-769, 2000.
[18] G. McLachlan and T. Krishnan, The EM Algorithm and Extensions. John Wiley & Sons, 1997.
[19] G. McLachlan and D. Peel, Finite Mixture Models. John Wiley & Sons, 2000.
[20] G.J. McLachlan and K.E. Basford, Mixture Models: Inference and Applications to Clustering. Marcel Dekker, 1988.
[21] R.M. Neal and G.E. Hinton, “A New View of the EM Algorithm that Justifies Incremental, Sparse and Other Variants,” Learning in Graphical Models, M.I. Jordan, ed., pp. 355-368, Kluwer Academic, 1998.
[22] K. Nigam, A. McCallum, S. Thrun, and T. Mitchell, “Text Classification from Labeled and Unlabeled Documents Using EM,” Machine Learning, vol. 39, nos. 2-3, pp. 103-134, 2000.
[23] F. Pernkopf and D. Bouchaffra, “Genetic-Based EM Algorithm for Learning Gaussian Mixture Models,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 8, pp. 1344-1348, Aug. 2005.
[24] C.K. Reddy, “TRUST-TECH Based Methods for Optimization and Learning,” PhD dissertation, Cornell Univ., 2007.
[25] C.K. Reddy and H.D. Chiang, “A Stability Boundary Based Method for Finding Saddle Points on Potential Energy Surfaces,” J. Computational Biology, vol. 13, no. 3, pp. 745-766, 2006.
[26] C.K. Reddy, Y.C. Weng, and H.D. Chiang, “Refining Motifs by Improving Information Content Scores Using Neighborhood Profile Search,” BMC Algorithms for Molecular Biology, vol. 1, no. 23, pp. 1-14, 2006.
[27] R.A. Redner and H.F. Walker, “Mixture Densities, Maximum Likelihood and the EM Algorithm,” SIAM Rev., vol. 26, pp. 195-239, 1984.
[28] S. Richardson and P. Green, “On Bayesian Analysis of Mixture Models with Unknown Number of Components,” J. Royal Statistical Soc., Series B, vol. 59, no. 4, pp. 731-792, 1997.
[29] S. Roberts, C. Holmes, and D. Denison, “Minimum-Entropy Data Partitioning Using Reversible Jump Markov Chain Monte Carlo,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 8, pp. 909-914, Aug. 2001.
[30] S.J. Roberts, D. Husmeier, I. Rezek, and W. Penny, “Bayesian Approaches to Gaussian Mixture Modeling,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 11, pp. 1133-1142, Nov. 1998.
[31] K. Rose, “Deterministic Annealing for Clustering, Compression, Classification, Regression, and Related Optimization Problems,” Proc. IEEE, vol. 86, no. 11, pp. 2210-2239, 1998.
[32] R.H. Shumway and D.S. Stoffer, “An Approach to Time Series Smoothing and Forecasting Using the EM Algorithm,” J. Time Series Analysis, vol. 3, no. 4, pp. 253-264, 1982.
[33] P. Smyth, “Model Selection for Probabilistic Clustering Using Cross-Validated Likelihood,” Statistics and Computing, vol. 10, no. 1, pp. 63-72, 2002.
[34] M.E. Tipping and C.M. Bishop, “Probabilistic Principal Component Analysis,” J. Royal Statistical Soc. Series B, vol. 61, no. 3, pp.611-622, 1999.
[35] N. Ueda and R. Nakano, “Deterministic Annealing EM Algorithm,” Neural Networks, vol. 11, no. 2, pp. 271-282, 1998.
[36] N. Ueda, R. Nakano, Z. Ghahramani, and G.E. Hinton, “SMEM Algorithm for Mixture Models,” Neural Computation, vol. 12, no. 9, pp. 2109-2128, 2000.
[37] J.J. Verbeek, N. Vlassis, and B. Krose, “Efficient Greedy Learning of Gaussian Mixture Models,” Neural Computation, vol. 15, no. 2, pp. 469-485, 2003.
[38] L.R. Welch, “Hidden Markov Models and the Baum-Welch Algorithm,” IEEE Information Theory Soc. Newsletter, vol. 53, no. 4, 2003.
[39] L. Xu and M.I. Jordan, “On Convergence Properties of the EM Algorithm for Gaussian Mixtures,” Neural Computation, vol. 8, no. 1, pp. 129-151, 1996.
[40] B. Zhang, C. Zhang, and X. Yi, “Competitive EM Algorithm for Finite Mixture Models,” Pattern Recognition, vol. 37, no. 1, pp. 131-144, 2004.
[41] Z. Zivkovic and F.V. Heijden, “Recursive Unsupervised Learning of Finite Mixture Models,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 5, pp. 651-656, May 2004.
104 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool