CSDL Home IEEE Transactions on Pattern Analysis & Machine Intelligence 2008 vol.30 Issue No.07 - July

Subscribe

Issue No.07 - July (2008 vol.30)

pp: 1146-1157

ABSTRACT

In spite of the initialization problem, the Expectation-Maximization (EM) algorithm is widely used for estimating the parameters of finite mixture models. Most popular model-based clustering techniques might yield poor clusters if the parameters are not initialized properly. To reduce the sensitivity of initial points, a novel algorithm for learning mixture models from multivariate data is introduced in this paper. The proposed algorithm takes advantage of TRUST-TECH (TRansformation Under STability-reTaining Equilibra CHaracterization) to compute neighborhood local maxima on likelihood surface using stability regions. Basically, our method coalesces the advantages of the traditional EM with that of the dynamic and geometric characteristics of the stability regions of the corresponding nonlinear dynamical system of the log-likelihood function. Two phases namely, the EM phase and the stability region phase, are repeated alternatively in the parameter space to achieve improvements in the maximum likelihood. The EM phase obtains the local maximum of the likelihood function and the stability region phase helps to escape out of the local maximum by moving towards the neighboring stability regions. The algorithm has been tested on both synthetic and real datasets and the improvements in the performance compared to other approaches are demonstrated. The robustness with respect to initialization is also illustrated experimentally.

INDEX TERMS

expectation maximization, unsupervised learning, finite mixture models, dynamical systems, stability regions, model-based clustering.

CITATION

Chandan K. Reddy, Hsiao-Dong Chiang, Bala Rajaratnam, "TRUST-TECH-Based Expectation Maximization for Learning Finite Mixture Models",

*IEEE Transactions on Pattern Analysis & Machine Intelligence*, vol.30, no. 7, pp. 1146-1157, July 2008, doi:10.1109/TPAMI.2007.70775REFERENCES

- [3] J.A. Bilmes, “A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models,” technical report, Univ. of California, Berkeley, Apr. 1998.
- [4] C.L. Blake and C.J. Merz,
UCI Repository of Machine Learning Databases, Dept. of Information and Computer Sciences, Univ. of California, Irvine, http://www.ics.uci.edu/mlearnMLRepository .html , 1998.- [7] A.P. Demspter, N.A. Laird, and D.B. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm,”
J. Royal Statistical Soc. Series B, vol. 39, no. 1, pp. 1-38, 1977.- [8] G. Elidan, M. Ninio, N. Friedman, and D. Schuurmans, “Data Perturbation for Escaping Local Maxima in Learning,”
Proc. 18th Nat'l Conf. Artificial Intelligence, pp. 132-139, 2002.- [10] Z. Ghahramani and G.E. Hinton, “The EM Algorithm for Mixtures of Factor Analyzers,” Technical Report CRG-TR-96-1, Univ. of Toronto, 1996.
- [12] T. Hastie and R. Tibshirani, “Discriminant Analysis by Gaussian Mixtures,”
J. Royal Statistical Soc. Series B, vol. 58, pp. 158-176, 1996.- [13] D. Heckerman, “A Tutorial on Learning with Bayesian Networks,” Microsoft Research Technical Report MSR-TR-95-06, 1995.
- [15] J.Q. Li, “Estimation of Mixture Models,” PhD dissertation, Dept. of Statistics, Yale Univ., 1999.
- [16] Z. Lu and T. Leen, “Semi-Supervised Learning with Penalized Probabilistic Clustering,”
Proc. Neural Information Processing Systems, 2005.- [18] G. McLachlan and T. Krishnan,
The EM Algorithm and Extensions. John Wiley & Sons, 1997.- [19] G. McLachlan and D. Peel,
Finite Mixture Models. John Wiley & Sons, 2000.- [20] G.J. McLachlan and K.E. Basford,
Mixture Models: Inference and Applications to Clustering. Marcel Dekker, 1988.- [21] R.M. Neal and G.E. Hinton, “A New View of the EM Algorithm that Justifies Incremental, Sparse and Other Variants,”
Learning in Graphical Models, M.I. Jordan, ed., pp. 355-368, Kluwer Academic, 1998.- [22] K. Nigam, A. McCallum, S. Thrun, and T. Mitchell, “Text Classification from Labeled and Unlabeled Documents Using EM,”
Machine Learning, vol. 39, nos. 2-3, pp. 103-134, 2000.- [24] C.K. Reddy, “TRUST-TECH Based Methods for Optimization and Learning,” PhD dissertation, Cornell Univ., 2007.
- [26] C.K. Reddy, Y.C. Weng, and H.D. Chiang, “Refining Motifs by Improving Information Content Scores Using Neighborhood Profile Search,”
BMC Algorithms for Molecular Biology, vol. 1, no. 23, pp. 1-14, 2006.- [27] R.A. Redner and H.F. Walker, “Mixture Densities, Maximum Likelihood and the EM Algorithm,”
SIAM Rev., vol. 26, pp. 195-239, 1984.- [28] S. Richardson and P. Green, “On Bayesian Analysis of Mixture Models with Unknown Number of Components,”
J. Royal Statistical Soc., Series B, vol. 59, no. 4, pp. 731-792, 1997.- [33] P. Smyth, “Model Selection for Probabilistic Clustering Using Cross-Validated Likelihood,”
Statistics and Computing, vol. 10, no. 1, pp. 63-72, 2002.- [37] J.J. Verbeek, N. Vlassis, and B. Krose, “Efficient Greedy Learning of Gaussian Mixture Models,”
Neural Computation, vol. 15, no. 2, pp. 469-485, 2003.- [38] L.R. Welch, “Hidden Markov Models and the Baum-Welch Algorithm,”
IEEE Information Theory Soc. Newsletter, vol. 53, no. 4, 2003. |