CSDL Home IEEE Transactions on Pattern Analysis & Machine Intelligence 2008 vol.30 Issue No.12 - December

Subscribe

Issue No.12 - December (2008 vol.30)

pp: 2236-2242

Jaemo Sung , POSTECH, Pohang

Zoubin Ghahramani , University of Cambridge, Cambridge

Sung-Yang Bang , POSTECH, Pohang

ABSTRACT

Variational Bayesian Expectation-Maximization (VBEM), an approximate inference method for probabilistic models based on factorizing over latent variables and model parameters, has been a standard technique for practical Bayesian inference. In this paper, we introduce a more general approximate inference framework for conjugate-exponential family models, which we call Latent-Space Variational Bayes (LSVB). In this approach, we integrate out model parameters in an exact way, leaving only the latent variables. It can be shown that the LSVB approach gives better estimates of the model evidence as well as the distribution over latent variables than the VBEM approach, but in practice, the distribution over latent variables has to be approximated. As a practical implementation, we present a First-order LSVB (FoLSVB) algorithm to approximate this distribution over latent variables. From this approximate distribution, one can estimate the model evidence and the posterior over model parameters. The FoLSVB algorithm is directly comparable to the VBEM algorithm and has the same computational complexity. We discuss how LSVB generalizes the recently proposed collapsed variational methods [20] to general conjugate-exponential families. Examples based on mixtures of Gaussians and mixtures of Bernoullis with synthetic and real-world data sets are used to illustrate some advantages of our method over VBEM.

INDEX TERMS

variational Bayesian inference, Machine learning, unsupervised Learning, latent variable model, conjugate exponential family, variational method, mixture of Gaussians

CITATION

Jaemo Sung, Zoubin Ghahramani, Sung-Yang Bang, "Latent-Space Variational Bayes",

*IEEE Transactions on Pattern Analysis & Machine Intelligence*, vol.30, no. 12, pp. 2236-2242, December 2008, doi:10.1109/TPAMI.2008.157REFERENCES

- [1] W.H. Jefferys and J.O. Berger, “Occam's Razor and Bayesian Analysis,”
Am. Scientist, vol. 80, pp. 64-72, 1992.- [2] C.E. Rasmussen and Z. Ghahramani, “Occam's Razor,”
Advances in Neural Information Processing Systems, vol. 13, MIT Press, 2001.- [3] D.M. Chickering and D. Heckerman, “Efficient Approximation for the Marginal Likelihood of Bayesian Networks with Hidden Variables,”
Machine Learning, vol. 29, no. 2, pp. 181-212, 1997.- [4] J.M. Bernardo and A.F.M. Smith,
Bayesian Theory. John Wiley & Sons, 2000.- [5] A. Gelman, J.B. Carlin, H.S. Stern, and D.B. Rubin,
Bayesian Data Analysis. Chapman and Hall/CRC, 1995.- [6] C.M. Bishop,
Pattern Recognition and Machine Learning. Springer, 2006.- [7] R.M. Neal, “Probabilistic Inference Using Markov Chain Monte Carlo Methods,” Technical Report CRG-TR-93-1, Dept. of Computer Science, Univ. of Toronto, 1993.
- [8] C.P. Robert and G. Casella,
Monte Carlo Statistical Methods. Springer-Verlag, 1999.- [9] C. Andrieu, N.D. Freitas, A. Doucet, and M.I. Jordan, “An Introduction to MCMC for Machine Learning,”
Machine Learning, vol. 50, pp. 5-43, 2003.- [10] D.J. Mackay,
Information Theory, Inference, and Learning Algorithms. Cambridge Univ. Press, 2003.- [11] M. Jordan, Z. Ghahramani, T. Jaakkola, and L. Saul, “An Introduction to Variational Methods for Graphical Models,”
Machine Learning, vol. 72, no. 2, pp. 183-233, 1999.- [12] T. Jaakkola, “Tutorial on Variational Approximation Methods,”
Advanced Mean Field Methods: Theory and Practice, M. Opper and D. Saad, eds., MIT Press, 2000.- [13] H. Attias, “Variational Bayesian Framework for Graphical Models,”
Advances in Neural Information Processing Systems, vol. 12, MIT Press, 2000.- [14] Z. Ghahramani and M.J. Beal, “Propagation Algorithms for Variational Bayesian Learning,”
Advances in Neural Information Processing Systems, vol. 13, MIT Press, 2001.- [15] Z. Ghahramani and M.J. Beal, “Variational Inference for Bayesian Mixtures of Factor Analysers,”
Advances in Neural Information Processing Systems, vol. 12, MIT Press, 2000.- [16] M.J. Beal and Z. Ghahramani, “The Variational Bayesian EM Algorithm for Incomplete Data: With Application to Scoring Graphical Model Structures,”
Bayesian Statistics, vol. 7, Oxford Univ. Press, 2003.- [18] J. Winn and C. Bishop, “Variational Message Passing,”
J. Machine Learning Research, vol. 6, pp. 661-694, 2005.- [19] M.J. Beal, “Variational Algorithms for Approximate Bayesian Inference,” PhD dissertation, Gatsby Computational Neuroscience Unit, Univ. College London, 2003.
- [20] Y.W. Teh, D. Newman, and M. Welling, “A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation,”
Advances in Neural Information Processing Systems, vol. 19, 2007.- [21] Y.W. Teh, K. Kurihara, and M. Welling, “Collapsed Variational Inference for HDP,”
Advances in Neural Information Processing Systems, vol. 20, 2008.- [22] A.P. Dempster, N.M. Laird, and D.B. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm,”
J. Royal Statistical Soc. B, vol. 39, pp. 1-38, 1977.- [23] G. McLachlan and D. Peel,
Finite Mixture Models. Wiley-Interscience, 2000.- [24] S. Richardson and P.J. Green, “On Bayesian Analysis of Mixtures with an Unknown Number of Components,”
J. Royal Statistical Soc. B, vol. 59, pp.731-792, 1997.- [25] A. Asuncion and D. Newman,
UCI Machine Learning Repository, http://www.ics.uci.edu/~mlearnMLRepository.html , 2007.- [26] M. Svensén and C.M. Bishop, “Robust Bayesian Mixture Modelling,”
Neurocomputing, vol. 64, pp. 235-252, 2004. |