loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Exploitation of Unlabeled Sequences in Hidden Markov Models
December 2003 (vol. 25 no. 12)
pp. 1570-1581

Abstract—This paper presents a method for effectively using unlabeled sequential data in the learning of hidden Markov models (HMMs). With the conventional approach, class labels for unlabeled data are assigned deterministically by HMMs learned from labeled data. Such labeling often becomes unreliable when the number of labeled data is small. We propose an extended Baum-Welch (EBW) algorithm in which the labeling is undertaken probabilistically and iteratively so that the labeled and unlabeled data likelihoods are improved. Unlike the conventional approach, the EBW algorithm guarantees convergence to a local maximum of the likelihood. Experimental results on gesture data and speech data show that when labeled training data are scarce, by using unlabeled data, the EBW algorithm improves the classification performance of HMMs more robustly than the conventional naive labeling (NL) approach.

[1] 1570 S.J. Raudys and A.K. Jain, "Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 13, pp. 252-264, 1991.[2] S. Ganesalingam and G.J. McLachlan, Some Efficiency Results for the Estimation of the Mixing Proportion in a Mixture of Two Normal Distributions Biometrics, vol. 37, pp. 22-33, Mar. 1981.[3] G.J. McLachlan, Discriminant Analysis and Statistical Pattern Recognition. John Wiley and Sons, 1992.[4] B. Shashahani and D. Landgrebe, The Effect of Unlabeled Samples in Reducing the Small Sample Size Problem and Mitigating the Hughes Phenomenon IEEE Trans. Geoscience and Remote Sensing, vol. 32, pp. 1087-1095, 1994.[5] D.J. Miller and H.S. Uyar, A Mixture of Experts Classifier with Learning Based on Both Labelled and Unlabelled Data Advances in Neural Information Processing Systems, M.C. Mozer, M.I. Jordan, T. Petsche, eds., vol. 9, pp. 571-577, 1997.[6] K. Nigam, A. McCallum, S. Thrun, and T. Mitchell, Text Classification from Labeled and Unlabeled Documents Using EM Machine Learning, pp. 1-34, 2000.[7] T. Kemp and A. Waibel, Unsupervised Training of a Speech Recognizer: Recent Experiments Proc. Eurospeech, vol. 6, pp. 2725-2728, 1999.[8] L. Lamel, J.L. Gauvain, and G. Adda, Lightly Supervised and Unsupervised Acoustic Model Training Computer Speech and Language, vol. 16, no. 1, pp. 115-129, Jan. 2002.[9] L.R. Bahl, F. Jelinek, and R. Mercer, A Maximum Likelihood Approach to Continuous Speech Recognition IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 5, pp. 179-190, Mar. 1983.[10] T.E. Starner and A. Pentland, Visual Recognition of American Sign Language Using Hidden Markov Models Proc. Int'l Workshop Automatic Face and Gesture Recognition, pp. 189-194, 1995.[11] A. Krogh, M. Brown, I.S. Mian, K. Sjölander, and D. Haussler, Hidden Markov Models in Computational Biology Applications to Protein Modeling J. Molecular Biology, vol. 235, no. 5, pp. 1501-1531, Feb. 1994.[12] D. Cutting, J. Kupiec, J. Pedersen, and P. Sibun, A Practical Part-of-Speech Tagger Proc. Third Conf. Applied Natural Language Processing, pp. 133-140, 1992.[13] D.M. Bikel, R. Schwartz, and R.M. Weischedel, An Algorithm that Learns What's in a Name Machine Learning, vol. 34, nos. 1-3, pp. 211-231, Feb. 1999.[14] B. Merialdo, Tagging English Text with a Probabilistic Model Computational Linguistics, vol. 20, no. 2 pp. 155-171, June 1994.[15] D. Elworthy, Does Baum-Welch Reestimation Help Taggers? Proc. Fourth Conf. Applied Natural Language Processing, pp. 53-58, 1994.[16] K. Seymore, A. McCallum, and R. Rosenfeld, Learning Hidden Markov Model Structure for Information Extraction Proc. AAAI Workshop Machine Learning for Information Extraction, pp. 37-42, 1999.[17] J.R. Bellegarda and D. Nahamoo, Tied Mixture Continuous Parameter Modeling for Speech Recognition IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 38, no. 12, pp. 2033-2045, Dec. 1990.[18] X.D. Huang, Phoneme Classification Using Semicontinuous Hidden Markov Models IEEE Trans. Signal Processing, vol. 40, no. 5, pp. 1062-1067, May 1992.[19] A.P. Dempster, N.M. Laird, and D.B. Rubin, Maximum Likelihood from Incomplete Data via the EM Algorithm J. Royal Statistical Soc. B, vol. 39, no. 1, pp. 1-38, 1977.[20] M. Inoue and N. Ueda, HMMs for Both Labeled and Unlabeled Time Series Data Proc. IEEE Workshop Neural Networks for Signal Processing, pp. 93-102, 2001.[21] G. Aversano, A. Esposito, A. Esposito, and M. Marinaro, A New Text-Independent Method for Phoneme Segmentation Proc. 44th IEEE Midwest Symp. Circuits and Systems, vol. 2, pp. 516-519, 2001.[22] L.E. Baum, T. Petrie, G. Soules, and N. Weiss, A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Function of Markov Chains The Annals of Math. Statistics, vol. 41, no. 1, pp. 164-171, 1970.[23] X.D. Huang, Y. Ariki, and M.A. Jack, Hidden Markov Models for Speech Recognition. Edinburgh: Edinburgh Univ. Press, 1990.[24] R. Duda, P. Hart, and D. Stork, Pattern Classification. New York: John Wiley&Sons, 2001.[25] J.S. Garofolo, L.F. Lamel, W.M. Fisher, J.G. Fiscus, D.S. Pallett, and N.L. Dahlgren, DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus CD-ROM. Gaithersburg, Md.: Nat'l Inst. of Standards and Tech nology, 1993.[26] K.F. Lee and H.W. Hon, Speaker-Independent Phone Recognition Using Hidden Markov Models IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 37, no. 11, pp. 1641-1648, Nov. 1989.[27] E. McDermott and S. Katagiri, String-Level MCE for Continuous Phoneme Recognition Proc. Eurospeech, vol. 1, pp. 123-126, 1997.[28] A. Corduneanu and T. Jaakkola, Continuation Methods for Mixing Heterogeneous Sources Uncertainty in Artificial Intelligence: Proc. 18th Conf., pp. 111-118, 2002.[29] A. Blum and T. Mitchell, Combined Labeled and Unlabeled Data with Co-Training Proc. Conf. Computational Learning Theory, pp. 92-100, 1998.

Index Terms:
Unlabeled data, sequential data, hidden Markov models, extended Baum-Welch algorithm.
Citation:
Masashi Inoue, Naonori Ueda, "Exploitation of Unlabeled Sequences in Hidden Markov Models," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 12, pp. 1570-1581, Dec. 2003, doi:10.1109/TPAMI.2003.1251150
Usage of this product signifies your acceptance of the Terms of Use.