This Article 
 Bibliographic References 
 Add to: 
Variational Bayes for Continuous Hidden Markov Models and Its Application to Active Learning
April 2006 (vol. 28 no. 4)
pp. 522-532
In this paper, we present a varitional Bayes (VB) framework for learning continuous hidden Markov models (CHMMs), and we examine the VB framework within active learning. Unlike a maximum likelihood or maximum a posteriori training procedure, which yield a point estimate of the CHMM parameters, VB-based training yields an estimate of the full posterior of the model parameters. This is particularly important for small training sets since it gives a measure of confidence in the accuracy of the learned model. This is utilized within the context of active learning, for which we acquire labels for those feature vectors for which knowledge of the associated label would be most informative for reducing model-parameter uncertainty. Three active learning algorithms are considered in this paper: 1) query by committee (QBC), with the goal of selecting data for labeling that minimize the classification variance, 2) a maximum expected information gain method that seeks to label data with the goal of reducing the entropy of the model parameters, and 3) an error-reduction-based procedure that attempts to minimize classification error over the test data. The experimental results are presented for synthetic and measured data. We demonstrate that all of these active learning methods can significantly reduce the amount of required labeling, compared to random selection of samples for labeling.

[1] L.R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proc. IEEE, vol. 77, no. 2, pp. 257-286, 1989.
[2] A. Dempster, N. Laird, and D. Rubin, “Maximum Likelihood from Incomplete Data Via the EM Algorithm,” J. Royal Statistical Soc. B, vol. 39, pp. 1-38, 1977.
[3] R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification, second ed. John Wiley and Sons, 2001.
[4] J.C. Spall, “Estimation Via Markov Chain Monte Carlo,” IEEE Control System Magazine, Apr. 2003.
[5] R.M. Neal and G.E. Hinton, “A View of the EM Algorithm that Justifies Incremental, Sparse, and Other Variants,” Learning in Graphical Models, pp. 355-368, 1998.
[6] C.M. Bishop and M.E. Tipping, “Variational Relevance Vector Machines,” Proc. 16th Conf. Uncertainty in Artificial Intelligence, pp. 46-53, 2000.
[7] T. Jaakkola and M.I. Jordan, “Bayesian Parameter Estimation Via Variational Methods,” Statistics and Computing, no. 10, pp. 25-37, 2000.
[8] D. MacKay, “Ensemble Learning for Hidden Markov Models,” technical report, Dept. of Physics, Univ. of Cambridge 1997.
[9] H. Attias, “A Variational Bayesian Framework for Graphical Models,” Proc. Ann. Conf. Neural Information Processing Systems, 2000.
[10] T.P. Minka, “Using Lower Bounds to Approximate Integrals,” 2001, .
[11] J.L. Gauvain and C.H. Lee, “Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains,” IEEE Trans. Speech and Audio Processing, vol. 2, pp. 291-298, 1994.
[12] H.S. Seung, M. Opper, and H. Smopolinsky, “Query by Committee,” Proc. Fifth Ann. ACM Workshop Computational Learning Theory, pp. 287-294, 1992.
[13] Y. Freund, H.S. Seung, E. Shamir, and N. Tishby, “Selective Sampling Using the Query by Committee Algorithm,” Machine Learning, vol. 28, pp. 133-168, 1997.
[14] S.A. Engelson and I. Dagan, “Committee-Based Sample Selection for Probabilistic Classifiers,” J. Artificial Intelligence Research, pp. 335-360, 1999.
[15] A. McCallum and K. Nigam, “Employing EM and Pool-Based Active Learning for Text Classification,” Machine Learning: Proc. 15th Int'l Conf., pp. 359-367, 1998.
[16] S. Tong and D. Koller, “Support Vector Machine Active Learning with Applications to Text Classification,” J. Machine Learning Research, vol. 2, pp. 45-66, 2001.
[17] D. MacKay, “Information-Based Objective Functions for Active Data Selection,” Neural Computation, vol. 4, pp. 589-603, 1992.
[18] D. Cohn, Z. Ghahramani, and M. Jordan, “Active Learning with Statistical Models,” J. Artificial Intelligence Research, vol. 4, pp. 129-145, 1996.
[19] S. Geman, E. Bienenstock, and R. Doursat, “Neural Networks and the Bias/Variance Dilemma,” Neural Computation, 1992.
[20] N. Roy and A. McCallum, “Toward Optimal Active Learning through Sampling Estimation of Error Reduction,” Proc. 18th Int'l Conf. Machine Learning, 2001.
[21] P. Runkle, P. Bharadwaj, and L. Carin, “Hidden Markov Model Multi-Aspect Target Classification,” IEEE Trans. Signal Processing, vol. 47, pp. 2035-2040, July 1999.
[22] T.M. Cover and J.A. Thomas, Elements of Information Theory. John Wiley and Sons, 1991.
[23] B.J. Frey and N. Jojic, “Advances in Algorithms for Inference And Learning in Complex Probability Models,” IEEE Trans. Pattern Analysis and Machine Intelligence, 2003.

Index Terms:
Variational Bayes (VB), continuous hidden Markov models (CHMMs), active learning (AL), query by committee (QBC), maximum expected information gain (MEIG), error-reduction-based active learning.
Shihao Ji, Balaji Krishnapuram, Lawrence Carin, "Variational Bayes for Continuous Hidden Markov Models and Its Application to Active Learning," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 4, pp. 522-532, April 2006, doi:10.1109/TPAMI.2006.85
Usage of this product signifies your acceptance of the Terms of Use.