This Article 
 Bibliographic References 
 Add to: 
Bayesian Estimation of Beta Mixture Models with Variational Inference
November 2011 (vol. 33 no. 11)
pp. 2160-2173
Zhanyu Ma, KTH -- Royal Institute of Technology, Stockholm
Arne Leijon, KTH -- Royal Institute of Technology, Stockholm
Bayesian estimation of the parameters in beta mixture models (BMM) is analytically intractable. The numerical solutions to simulate the posterior distribution are available, but incur high computational cost. In this paper, we introduce an approximation to the prior/posterior distribution of the parameters in the beta distribution and propose an analytically tractable (closed form) Bayesian approach to the parameter estimation. The approach is based on the variational inference (VI) framework. Following the principles of the VI framework and utilizing the relative convexity bound, the extended factorized approximation method is applied to approximate the distribution of the parameters in BMM. In a fully Bayesian model where all of the parameters of the BMM are considered as variables and assigned proper distributions, our approach can asymptotically find the optimal estimate of the parameters posterior distribution. Also, the model complexity can be determined based on the data. The closed-form solution is proposed so that no iterative numerical calculation is required. Meanwhile, our approach avoids the drawback of overfitting in the conventional expectation maximization algorithm. The good performance of this approach is verified by experiments with both synthetic and real data.

[1] K. Fukunaga, Introduction to Statistical Pattern Recognition. Academic Press, 1990.
[2] A.K. Jain, R.P.W. Duin, and J. Mao, "Statistical Pattern Recognition: A Review," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 1, pp. 4-37, Jan. 2000.
[3] A.R. Webb, Statistical Pattern Recognition, second ed. Wiley, 2002.
[4] C.M. Bishop, Pattern Recognition and Machine Learning. Springer, 2006.
[5] R. Gnanadesikan, R.S. Pinkham, and L.P. Hughes, "Maximum Likelihood Estimation of the Parameters of the Beta Distribution from Smallest Order Statistics," Technometrics, vol. 9, pp. 607-620, 1967.
[6] R.J. Beckman and G.L. Tietjen, "Maximum Likelihood Estimation for the Beta Distribution," J. Statistical Computation and Simulation, vol. 7, pp. 253-258, 1978.
[7] B. Wagle, "Multivariate Beta Distribution and a Test for Multivariate Normality," J. Royal Statistical Soc. Series B (Methodological), vol. 30, pp. 511-516, 1968.
[8] I. Olkin and R. Liu, "A Bivariate Beta Distribution," Statistics and Probability Letters, vol. 62, pp. 407-412, 2003.
[9] Handbook of Beta Distribution and Its Applications, A.K. Gupta, and S. Nadarajah, eds. Marcel Dekker, 2004.
[10] N. Bouguila, D. Ziou, and E. Monga, "Practical Bayesian Estimation of a Finite Beta Mixture through Gibbs Sampling and Its Applications," Statistics and Computing, vol. 16, pp. 215-225, 2006.
[11] V.P. Savchuk and H.F. Martz, "Bayes Reliability Estimation Using Multiple Sources of Prior Information: Binomial Sampling," IEEE Trans. Reliability, vol. 43, no. 1, pp. 138-144, Mar. 1994.
[12] J.C. Lee and Y.L. Lio, "A Note on Bayesian Estimation and Prediction for the Beta-Binomial Model," J. Statistical Computation and Simulation, vol. 63, pp. 73-91, 1999.
[13] F. Cribari-Neto and K.L.P. Vasconcellos, "Nearly Unbiased Maximum Likelihood Estimation for the Beta Distribution," J. Statistical Computation and Simulation, vol. 72, pp. 107-118, 2002.
[14] G.J. McLachlan and D. Peel, Finite Mixture Models. Wiley, 2000.
[15] M.A.T. Figueiredo and A.K. Jain, "Unsupervised Learning of Finite Mixture Models," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 3, pp. 381-396, Mar. 2002.
[16] Y. Ji, C. Wu, P. Liu, J. Wang, and K.R. Coombes, "Application of Beta-Mixture Models in Bioinformatics," Bioinformatics Applications Note, vol. 21, pp. 2118-2122, 2005.
[17] Z. Ma and A. Leijon, "Beta Mixture Models and the Application to Image Classification," Proc. Int'l Conf. Image Processing, 2009.
[18] Z. Ma and A. Leijon, "Human Skin Color Detection in RGB Space with Bayesian Estimation of Beta Mixture Models," Proc. 18th European Signal Processing Conf., 2010.
[19] P. Hedelin and J. Skoglund, "Vector Quantization Based on Gaussian Mixture Models," IEEE Trans. Speech and Audio Processing, vol. 8, no. 4, pp. 385-401, July 2000.
[20] J. Lindblom and J. Samuelsson, "Bounded Support Gaussian Mixture Modeling of Speech Spectra," IEEE Trans. Speech and Audio Processing, vol. 11, no. 1, pp. 88-99, Jan. 2003.
[21] Z. Ma and A. Leijon, "PDF-Optimized LSF Vector Quantization Based on Beta Mixture Models," Proc. 11th Ann. Conf. Int'l Speech Comm. Assoc., 2010.
[22] G.J. McLachlan and T. Krishnan, The EM Algorithm and Extensions. Wiley, 1997.
[23] H. Attias, "A Variational Bayesian Framework for Graphical Models," Advances in Neural Information Processing Systems 12, pp. 209-215, MIT Press, 2000.
[24] N. Ueda and Z. Ghahramani, "Bayesian Model Search for Mixture Models Based on Optimizing Variational Bounds," Neural Network, vol. 15, pp. 1223-1241, 2002.
[25] T.S. Jaakkola and M.I. Jordan, "Bayesian Parameter Estimation via Variational Methods," Statistics and Computing, vol. 10, pp. 25-37, 2000.
[26] T.S. Jaakkola, "Tutorial on Variational Approximation Methods," Advances in Mean Field Methods, M. Opper and D. saad, eds., pp. 129-159, MIT Press, 2001.
[27] J.M. Bernardo and A.F.M. Smith, Bayesian Theory. John Wiley and Sons, Ltd., 1994.
[28] D.M. Blei, "Probabilistic Models of Text and Images," PhD dissertation, Univ. of California, Berkeley, 2004.
[29] J.A. Palmer, "Relative Convexity," technical report, Electrical and Computer Eng. Dept., Univ. of California San Diego, 2003.
[30] P. Diaconis and D. Ylvisaker, "Conjugate Priors for Exponential Families," The Annals of Statistics, vol. 7, pp. 269-281, 1979.
[31] M.I. Jordan, Z. Ghahramani, T.S. Jaakkola, and L.K. Saul, "An Introduction to Variational Methods for Graphical Models," Machine Learning, vol. 37, no. 2, pp. 183-233, 1999.
[32] M.D. Hoffman, D.M. Blei, and P.R. Cook, "Bayesian Nonparametric Matrix Factorization for Recorded Music," Proc. 27th Int'l Conf. Machine Learning, 2010.
[33] M.I. Jordan, Learning in Graphical Models. MIT Press, 1999.
[34] M. Braun and J. McAuliffe, "Variational Inference for Large-Scale Models of Discrete Choice," J. Am. Statistical Assoc., vol. 105, pp. 324-335, 2010.
[35] P.J. Bickel and K.A. Doksum, Mathematical Statistics: Basic Ideas and Selected Topics. Pearson Prentice Hall, 2007.
[36] D.M. Blei and J.D. Lafferty, "Correlated Topic Models," Proc. Advances in Neural Information Processing Systems, 2006.
[37] D.M. Blei and J.D. Lafferty, "A Correlated Topic Model of Science," The Annals of Applied Statistics, vol. 1, pp. 17-35, 2007.
[38] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge Univ. Press, 2004.
[39] R.E. Kass and A.E. Raftery, "Bayes Factors," J. Am. Statistical Assoc., vol. 90, no. 430, pp. 773-795,, 1995.
[40] M.J. Jones and J.M. Rehg, "Statistical Color Models with Application to Skin Detection," Int'l J. Computer Vision, vol. 46, no. 1, pp. 81-96, 2002.
[41] J. Brand and J.S. Mason, "A Comparative Assessment of Three Approaches to Pixel-Level Human Skin-Detection," Proc. IEEE Int'l Conf. Pattern Recognition, vol. 1, pp. 1056-1059, 2000.
[42] J.Y. Lee and S.I. Yoo, "An Elliptical Boundary Model for Skin Color Detection," Proc. Int'l Conf. Imaging Science, Systems, and Technology, 2002.
[43] B. Jedynak, H. Zheng, M. Daoudi, and D. Barret, "Maximum Entropy Models for Skin Detection," Proc. Indian Conf. Computer Vision, Graphics, and Image Processing, pp. 276-281, 2002.
[44] D.A. Brown, I. Craw, and J. Lewthwaite, "A SOM Based Approach to Skin Detection with Application in Real Time Systems," Proc. British Machine Vision Conf., 2001.
[45] M.M. Aznaveh, H. Mirzaei, E. Roshan, and M. Saraee, "A New and Improved Skin Detection Method Using RGB Vector Space," Proc. IEEE Int'l Multi-Conf. Systems, Signals, and Devices, pp. 1-5, July 2008.
[46] T. Fawcett, "An Introduction to ROC analysis," Pattern Recognition Letterd, vol. 27, no. 8, pp. 861-874, 2006.
[47] "DARPA-TIMIT," Acoustic-Phonetic Continuous Speech Corpus, NIST Speech Disc 1.1-1, 1990.

Index Terms:
Bayesian estimation, maximum likelihood estimation, beta distribution, mixture modeling, variational inference, factorized approximation.
Zhanyu Ma, Arne Leijon, "Bayesian Estimation of Beta Mixture Models with Variational Inference," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 11, pp. 2160-2173, Nov. 2011, doi:10.1109/TPAMI.2011.63
Usage of this product signifies your acceptance of the Terms of Use.