Publication 2006 Issue No. 7 - July Abstract - Minimum Classification Error Training for Online Handwriting Recognition
Minimum Classification Error Training for Online Handwriting Recognition
July 2006 (vol. 28 no. 7)
pp. 1041-1051
 ASCII Text x Alain Biem, "Minimum Classification Error Training for Online Handwriting Recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 7, pp. 1041-1051, July, 2006.
 BibTex x @article{ 10.1109/TPAMI.2006.146,author = {Alain Biem},title = {Minimum Classification Error Training for Online Handwriting Recognition},journal ={IEEE Transactions on Pattern Analysis and Machine Intelligence},volume = {28},number = {7},issn = {0162-8828},year = {2006},pages = {1041-1051},doi = {http://doi.ieeecomputersociety.org/10.1109/TPAMI.2006.146},publisher = {IEEE Computer Society},address = {Los Alamitos, CA, USA},}
 RefWorks Procite/RefMan/Endnote x TY - JOURJO - IEEE Transactions on Pattern Analysis and Machine IntelligenceTI - Minimum Classification Error Training for Online Handwriting RecognitionIS - 7SN - 0162-8828SP1041EP1051EPD - 1041-1051A1 - Alain Biem, PY - 2006KW - Minimum classification errorKW - hidden Markov modelKW - handwriting recognitionKW - maximum likelihoodKW - discriminative trainingKW - dynamic programmingKW - finite state machine.VL - 28JA - IEEE Transactions on Pattern Analysis and Machine IntelligenceER -
Alain Biem, IEEE
This paper describes an application of the Minimum Classification Error (MCE) criterion to the problem of recognizing online unconstrained-style characters and words. We describe an HMM-based, character and word-level MCE training aimed at minimizing the character or word error rate while enabling flexibility in writing style through the use of multiple allographs per character. Experiments on a writer-independent character recognition task covering alpha-numerical characters and keyboard symbols show that the MCE criterion achieves more than 30 percent character error rate reduction compared to the baseline Maximum Likelihood-based system. Word recognition results, on vocabularies of 5k to 10k, show that MCE training achieves around 17 percent word error rate reduction when compared to the baseline Maximum Likelihood system.

[1] N. Matic, I. Guyon, J. Denker, and V. Vapnik, “Writer-Adaptation for Online Handwritten Character Recognition,” Proc. Second Int'l Conf. Pattern Recognition and Document Analysis, pp. 187-191, 1993.
[2] J. Subrahmonia, K. Nathan, and M. Perrone, “Writer Dependent Recognition of Online Unconstrained Handwriting,” Proc. Int'l Conf. Acoustics, Speech, and Signal Processing, vol. 6, pp. 3478-3481, 1996.
[3] A. Brakensiek, A. Kosmala, and G. Rigoll, “Comparing Adaptation Techniques for Online Handwriting Recognition,” Proc. Int'l Conf. Document Analysis and Recognition, pp. 486-490, 2001.
[4] M. Parizeau and B. Plamondon, “A Fuzzy-Syntactic Approach to Allograph Modeling for Cursive Script Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 17, no. 7, pp. 702-712, July 1995.
[5] E. Anquetil and G. Lorette, “Automatic Generation of Hierarchical Fuzzy Classification Systems Based on Explicit Fuzzy Rules Deduced from Possibilistic Clustering: Application to Online Handwritten Character Recognition,” Proc. Information Processing and Management of Uncertainty in Knowledge Based Systems, pp. 259-264, 1996.
[6] L. Prevost and M. Milgram, “Automatic Allograph Selection and Multiple Expert Classification for Totally Unconstrained Handwritten Character Recognition,” Proc. 14th Int'l Conf. Pattern Recognition, pp. 381-383, 1998.
[7] V. Vuori, J. Laaksonen, and E. Oj, “Online Adaptation in Recognition of Handwritten Alphanumeric Characters,” Proc. Int'l Conf. Document Analysis and Recognition, pp. 792-795, 1999.
[8] S.D. Connell and A.K. Jain, “Writer Adaptation for Online Handwriting Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 3, pp. 329-346, Mar. 2002.
[9] L. Oudot, L. Prevost, A. Moises, and M. Milgram, “Self Supervised Writer Adaptation Using Perceptive Concepts: Application to Online Text Recognition,” Proc. Int'l Conf. Pattern Recognition, vol. 2, pp. 598-601, 2004.
[10] F. Jelinek, Statistical Methods for Speech Recognition. Cambridge, Mass.: MIT Press, 1999.
[11] A. Nadas, “A Decision Theoretic Formulation of a Training Problem in Speech Recognition and a Comparison of Training by Unconditional versus Conditional Maximum Likelihood,” IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 31, no. 4, pp. 814-817, 1983.
[12] R.O. Duda and P.E. Hart, Pattern Classification and Scene Analysis. Wiley Interscience, 1973.
[13] V.N. Vapnik, Statistical Learning Theory. Wiley, 1998.
[14] C.M. Bishop, Neural Network for Pattern Recognition. Oxford Univ. Press, 1995.
[15] Y. Normandin, “Hidden Markov Models, Maximum Mutual Information Estimation, and the Speech Recognition Problem,” PhD dissertation, Dept. of Electrical Eng., McGill Univ., Montreal, Canada, 1991.
[16] R. Nopsuwanchai and A. Biem, “Discriminative Training of Tied Mixture Density HMMs for Online Handwritten Digit Recognition,” Proc. Int'l Conf. Acoustics, Speech, and Signal Processing, pp. 817-820, Apr. 2003.
[17] B.-H. Juang and S. Katagiri, “Discriminative Learning for Minimum Error Classification,” IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 40, no. 12, pp. 3043-3054, Dec. 1992.
[18] S. Katagiri, C.-H. Lee, and B.-H. Juang, “New Discriminative Training Algorithms Based on the Generalized Descent Method,” Proc. IEEE Worshop Neural Networks for Signal Processing, pp. 309-318, 1991.
[19] A. Biem and S. Katagiri, “Feature Extraction Based on Minimum Classification Error/Generalized Probabilistic Descent Method,” Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, vol. 2, pp. 275-278, Apr. 1993.
[20] E. McDermott and S. Katagiri, “Prototype-Based Minimum Classification Error/Generalized Probabilistic Descent for Various Speech Units,” Computer Speech and Language, vol. 8, no. 8, pp. 351-368, Oct. 1994.
[21] W. Chou, B.-H. Juang, and C.-H. Lee, “Segmental GPD Training of HMM Based Speech Recognizer,” Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, vol. 1, pp. 473-476, Mar. 1992.
[22] E. McDermott, “Discriminative Training for Speech Recognition,” PhD dissertation, Waseda Univ., http://www.kecl.ntt.co.jp/icl/signal/erik mcd_thesis_a4.ps.gz, 1997.
[23] A. Biem, “Discriminative Feature Extraction Applied to Speech Recognition,” PhD dissertation, Univ. Paris 6, http://www-connex.lip6.frarticle.php?id=786 , 1997.
[24] R. Schlueter and H. Ney, “Model-Based MCE Bound to the True Bayes' Error,” IEEE Signal Processing Letters, vol. 8, no. 5, pp. 131-133, 2001.
[25] S. Katagiri, B.-H. Juang, and C.-H. Lee, “Pattern Recognition Using a Family of Design Algorithms Based upon the Generalized Probabilistic Descent Method,” Proc. IEEE, vol. 86, no. 11, 1998.
[26] T. Komori and S. Katagiri, “A Novel Spotting-Based Approach to Continuous Speech Recognition: Minimum Error Classification of Keyword-Sequences,” Proc. Acoustical Soc. Japan, vol. 16, no. 3, pp. 147-157, June 1995.
[27] C.-S. Liu, C.-H. Lee, W. Choi, B.-H. Juang, and A. Rosenberg, “A Study on Minimum Error Discriminative Training for Speaker Recognition,” J. Acoustical Soc. Am., vol. 97, no. 1, pp. 637-648, Jan. 1995.
[28] T. Matsui and S. Furui, “A Study of Speaker Adaptation Based on Minimum Classification Error Training,” Proc. European Conf. Speech Comm. and Technology, pp. 81-84, 1995.
[29] C. Rathinavelu and L. Deng, “HMM-Based Speech Recognition Using State-Dependent, Discriminately Derived Transforms on Mel-Warped DFT Features,” IEEE Trans. Speech and Audio Processing, vol. 5, no. 3, pp. 243-256, May 1997.
[30] A. De la Torre, A.M. Peinado, A.J. Rubio, and V. Sanchez, “An Application of Minimum Classification Error to Feature Space Transformation for Speech Recognition,” Speech Comm., vol. 20, pp. 273-290, 1996.
[31] A. Biem, S. Katagiri, and B.-H. Juang, “Pattern Recognition Based on Discriminative Feature Extraction,” IEEE Trans. Signal Processing, vol. 45, no. 2, pp. 500-504, 1997.
[32] A. Biem, S. Katagiri, E. McDermott, and B.-H. Juang, “An Application of Discriminative Feature Extraction to Filter-Bank-Based Speech Recognition,” IEEE Trans. Speech and Audio Processing, vol. 9, no. 2, pp. 96-110, Feb. 2001.
[33] A. Biem, “Optimizing Features and Models Using the Minimum Classification Error Criterion,” Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, pp. 868-871, 2003.
[34] C.L. Liu and M. Nakagawa, “Evaluation of Prototype Learning Algorithm for Nearest Neighbor Classifier in Application to Handwritten Character Recognition,” Pattern Recognition, vol. 3, no. 34, pp. 601-615, 2001.
[35] Q. Huo, Y. Ge, and Z. Feng, “High Performance Chinese OCR Based on Gabor Features,” Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, vol. 3, 2001.
[36] W.-T. Chen and P. Gader, “World Level Discriminative Training for Handwritten Word Recognition,” Proc. Int'l Workshop Frontiers in Handwriting Recognition pp. 393-402, 2000.
[37] A. Biem, “Minimum Classification Error Training of Hidden Markov Model for Handwriting Recognition,” Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, vol. 3, pp. 1529-1532, 2001.
[38] A. Biem, “Minimum Classification Error Training for Online Handwritten Word Recognition,” Proc. Int'l Workshop Frontiers in Handwriting Recognition, pp. 61-66, Aug. 2002.
[39] M.P. Perrone and S.D. Connell, “K-Means Clustering for Hidden Markov Models,” Proc. Int'l Workshop Frontiers in Handwriting Recognition, pp. 229-238, 2000.
[40] B.-H. Juang and L. Rabiner, “The Segmental $k{\hbox{-}}\rm Means$ Algorithm for Estimating Parameter of Hidden Markov Models,” IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 38, no. 9, pp. 1639-1641, 1991.
[41] R. Nopsuwanchai and A. Biem, “Prototype-Baased Minimum Erro Classifier for Handwritten Digits Recognition,” Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, vol. 5, pp. 845-848, May 2004.
[42] E. McDermott and S. Katagiri, “Minimum Classification Error for Large Scale Speech Recognition Tasks Using Weighted Finite State Transducers,” Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, vol. 1, pp. 113-116, Mar. 2005.
[43] E. McDermott and S. Katagiri, “A New Formalization of Minimum Classification Error Using a Parzen Estimate of Classification Chance,” Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, vol. 2, pp. 713-716, 2003.
[44] M.E. Typing, “The Relevance Vector Machine,” Advances in Neural Information Processing Systems 12, 2000.
[45] S.E. Fahlman, “An Empirical Study of Learning Speech in Back-Propagation Networks,” technical report, Carnegie Mellon Univ., 1988.
[46] K.S. Nathan, H. Beigi, J. Subrahmonia, G.J. Clary, and H. Maruyama, “Real-Time Online Unconstrained Handwriting Recognition Using Statistical Methods,” Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, 1995.
[47] H. Ney and S. Ortmanns, “Progress in Dynamic Programming Search for LVCSR,” Proc. IEEE, vol. 88, no. 8, pp. 1224-1240, Aug. 2000.
[48] N. Nilsson, Principles of Artificial Intelligence. Tioga Publishing Company, 1982.
[49] F.K. Soong and E.-F. Huang, “A Tree-Treillis Based Fast Search for Finding the N-Best Sentence Hypothese in Continuous Speech Recognition,” Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, vol. 1, pp. 705-708, Apr. 1991.

Index Terms:
Minimum classification error, hidden Markov model, handwriting recognition, maximum likelihood, discriminative training, dynamic programming, finite state machine.
Citation:
Alain Biem, "Minimum Classification Error Training for Online Handwriting Recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 7, pp. 1041-1051, July 2006, doi:10.1109/TPAMI.2006.146