This Article 
 Bibliographic References 
 Add to: 
Training Hidden Markov Models with Multiple Observations-A Combinatorial Method
April 2000 (vol. 22 no. 4)
pp. 371-377

Abstract—Hidden Markov models (HMMs) are stochastic models capable of statistical learning and classification. They have been applied in speech recognition and handwriting recognition because of their great adaptability and versatility in handling sequential signals. On the other hand, as these models have a complex structure and also because the involved data sets usually contain uncertainty, it is difficult to analyze the multiple observation training problem without certain assumptions. For many years researchers have used Levinson's training equations in speech and handwriting applications, simply assuming that all observations are independent of each other. This paper presents a formal treatment of HMM multiple observation training without imposing the above assumption. In this treatment, the multiple observation probability is expressed as a combination of individual observation probabilities without losing generality. This combinatorial method gives one more freedom in making different dependence-independence assumptions. By generalizing Baum's auxiliary function into this framework and building up an associated objective function using the Lagrange multiplier method, it is proven that the derived training equations guarantee the maximization of the objective function. Furthermore, we show that Levinson's training equations can be easily derived as a special case in this treatment.

[1] L.E. Baum and T. Petrie, “Statistical Inference for Probabilistic Functions of Finite State Markov Chains,” Annals of Math. Statistics, vol. 37, pp. 1,554-1,563, 1966.
[2] L.E. Baum and J.A. Egon, “An Inequality with Applications to Statistical Estimation for Probabilistic Functions of a Markov Process and to a Model for Ecology,” Bull. Amer. Meteorology Soc., vol. 73, pp. 360-363, 1967.
[3] L.E. Baum and G.R. Sell, “Growth Functions for Transformations on Manifolds,” Pacific J. Math., vol. 27, no. 2, pp. 211-227, 1968.
[4] L.E. Baum, T. Petrie, G. Soules, and N. Weiss, “A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains,” Annals of Math. Statistics, vol. 41, no. 1, pp. 164-171, 1970.
[5] L.E. Baum, “An Inequality and Associated Maximization Technique in Statistical Estimation for Probabilistic Functions of Markov Processes,” Inequalities, vol. 3, pp. 1-8, 1970.
[6] C.F.J. Wu, ”On the Convergence Properties of the EM Algorithm,” Annals of Statistics, vol. 11, no. 1, pp. 95-103, 1983.
[7] S.E. Levinson, L.R. Rabiner, and M.M. Sondhi, “An Introduction to the Application of the Theory of Probabilistic Functions of Markov Process to Automatic Speech Recognition,” Bell System Technical J., vol. 62, no. 4, pp. 1,035-1,074, 1983.
[8] L.R. Bahl, F. Jelinek, and R.L. Mercer, “A Maximum Likelihood Approach to Continuous Speech Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 5, pp. 179-190, 1983.
[9] L.R. Rabiner and S.E. Levinson, “A Speaker-Independent, Syntax-Directed, Connected Word Recognition System Based on Hidden Markov Models and Level Building,” IEEE Trans. Acoustic, Speech, and Signal Processing, vol. 33, no. 3, pp. 561-573, 1985.
[10] L.R. Rabiner, “Tutorial on Hidden Markov Model and Selected Applications in Speech Recognition,” Proc. IEEE, vol. 77, no. 2, pp. 257-285, 1989.
[11] K.-F. Lee, Automatic Speech Recognition—The Development of SPHINX System. Kluwer Academic, 1989.
[12] L.R. Rabiner and B.H. Juang, Fundamentals of Speech Recognition, Prentice Hall, Upper Saddle River, N.J., 1993.
[13] M. Iwayama, N. Indurkhya, and H. Motoda, “A New Algorithm for Automatic Configuration of Hidden Markov Models,” Proc. Fourth Int'l Workshop Algorithmic Learning Theory (ALT '93), pp. 237-250, 1993.
[14] A. Kaltenmeier, T. Caesar, J.M. Gloger, and E. Mandler, “Sophisticated Topology of Hidden Markov Models for Cursive Script Recognition,” Proc. Second Int'l Conf. Document Analysis and Recognition, pp. 139-142, 1993.
[15] P. Baldi and Y. Chauvim, “Smooth On-Line Learning Algorithms for Hidden Markov Models,” Neural Computation, vol. 6, pp. 307-318, 1994.
[16] S.B. Cho and J.H. Kim, “An HMM/MLP Architecture for Sequence Recognition,” Neural Computation, vol. 7, pp. 358-369, 1995.
[17] J. Dai, “Robust Estimation of HMM Parameters Using Fuzzy Vector Quantization and Parzen's Window,” Pattern Recognition, vol. 28, no. 1, pp. 53-57, 1995.
[18] A. Kundu, Y. He, and P. Bahl, “Recognition of Handwritten Word: First and Second Order Hidden Markov Model Based Approach,” Pattern Recognition, vol. 22, no. 3, pp. 283-297, Mar. 1989.
[19] S.R. Veltman and R. Prasad, “Hidden Markov Models Applied to On-Line Handwritten Isolated Character Recognition,” IEEE Trans. Image Processing, vol. 3, no. 3, pp. 314-318, 1994.
[20] E.J. Bellagarda, J.R. Bellagarda, D. Nahamoo, and K.S. Nathan, “A Fast Statistical Mixture Algorithm for On-Line Handwriting Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 16, no. 12, pp. 1,227-1,233, Dec. 1994.
[21] G. Rigoll, A. Kosmala, J. Rottland, and C.H. Neukirchen, “A Comparison between Continuous and Discrete Density Hidden Markov Models for Cursive Handwriting Recognition,” Proc. Int'l Conf. Pattern Recognition (ICPR '96), pp. 205-209, 1996.
[22] J. Hu, M.K. Brown, and W. Turin, “HMM Based On-Line Handwriting Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, no. 10, pp. 1,039-1,044, Oct. 1996.
[23] X. Li, M. Parizeau, and R. Plamondon, ”Hidden Markov Model Multiple Observation Training,” Technical Report EPM/RT-99/16, Nov. 1999.

Index Terms:
Hidden Markov model, forward-backward procedure, Baum-Welch algorithm, multiple observation training.
Xiaolin Li, Marc Parizeau, Réjean Plamondon, "Training Hidden Markov Models with Multiple Observations-A Combinatorial Method," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 4, pp. 371-377, April 2000, doi:10.1109/34.845379
Usage of this product signifies your acceptance of the Terms of Use.