This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Handwritten Chinese Text Recognition by Integrating Multiple Contexts
Aug. 2012 (vol. 34 no. 8)
pp. 1469-1481
Fei Yin, Nat. Lab. of Pattern Recognition, Inst. of Autom., Beijing, China
Qiu-Feng Wang, Nat. Lab. of Pattern Recognition, Inst. of Autom., Beijing, China
Cheng-Lin Liu, Nat. Lab. of Pattern Recognition, Inst. of Autom., Beijing, China
This paper presents an effective approach for the offline recognition of unconstrained handwritten Chinese texts. Under the general integrated segmentation-and-recognition framework with character oversegmentation, we investigate three important issues: candidate path evaluation, path search, and parameter estimation. For path evaluation, we combine multiple contexts (character recognition scores, geometric and linguistic contexts) from the Bayesian decision view, and convert the classifier outputs to posterior probabilities via confidence transformation. In path search, we use a refined beam search algorithm to improve the search efficiency and, meanwhile, use a candidate character augmentation strategy to improve the recognition accuracy. The combining weights of the path evaluation function are optimized by supervised learning using a Maximum Character Accuracy criterion. We evaluated the recognition performance on a Chinese handwriting database CASIA-HWDB, which contains nearly four million character samples of 7,356 classes and 5,091 pages of unconstrained handwritten texts. The experimental results show that confidence transformation and combining multiple contexts improve the text line recognition performance significantly. On a test set of 1,015 handwritten pages, the proposed approach achieved character-level accurate rate of 90.75 percent and correct rate of 91.39 percent, which are superior by far to the best results reported in the literature.

[1] R.-W. Dai, C.-L. Liu, and B.-H. Xiao, "Chinese Character Recognition: History, Status and Prospects," Frontiers of Computer Science in China, vol. 1, no. 2, pp. 126-136, 2007.
[2] H. Fujisawa, "Forty Years of Research in Character and Document Recognition—An Industrial Perspective," Pattern Recognition, vol. 41, no. 8, pp. 2435-2446, Aug. 2008.
[3] C.-L. Liu, F. Yin, D.-H. Wang, and Q.-F. Wang, "Online and Offline Handwritten Chinese Character Recognition: Benchmarking on New Databases," Proc. Second CJK Joint Workshop Pattern Recognition, Oct. 2010.
[4] H.-S. Tang, E. Augustin, C.Y. Suen, O. Baret, and M. Cheriet, "Spiral Recognition Methodology and Its Application for Recognition of Chinese Bank Checks," Proc. Ninth Int'l Workshop Frontiers in Handwriting Recognition, pp. 263-268, Oct. 2004.
[5] C.-L. Liu, M. Koga, and H. Fujisawa, "Lexicon-Driven Segmentation and Recognition of Handwritten Character Strings for Japanese Address Reading," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 11, pp. 1425-1437, Nov. 2002.
[6] C.-H. Wang, Y. Hotta, M. Suwa, and S. Naoi, "Handwritten Chinese Address Recognition," Proc. Ninth Int'l Workshop Frontiers in Handwriting Recognition, pp. 539-544, Oct. 2004.
[7] Z. Han, C.-P. Liu, and X.-C. Yin, "A Two-Stage Handwritten Character Segmentation Approach in Mail Address Recognition," Proc. Eigth Int'l Conf. Document Analysis and Recognition, pp. 111-115, Aug. 2005.
[8] Q. Fu, X.-Q. Ding, T. Liu, Y. Jiang, and Z. Ren, "A Novel Segmentation and Recognition Algorithm for Chinese Handwritten Address Character Strings," Proc. 18th Int'l Conf. Pattern Recognition, pp. 974-977, Aug. 2006.
[9] T.-H. Su, T.-W. Zhang, D.-J. Guan, and H.-J. Huang, "Off-Line Recognition of Realistic Chinese Handwriting Using Segmentation-Free Strategy," Pattern Recognition, vol. 42, no. 1, pp. 167-182, 2009.
[10] Q.-F. Wang, F. Yin, and C.-L. Liu, "Integrating Language Model in Handwritten Chinese Text Recognition," Proc. 10th Int'l Conf. Document Analysis and Recognition, pp. 1036-1040, July 2009.
[11] N.-X. Li and L.-W. Jin, "A Bayesian-Based Probabilistic Model for Unconstrained Handwritten Offline Chinese Text Line Recognition," Proc. IEEE Int'l Conf. Systems, Man, and Cybernetics, pp. 3664-3668, 2010.
[12] M. Nakagawa, B. Zhu, and M. Onuma, "A Model of On-Line Handwritten Japanese Text Recognition Free from Line Direction and Writing Format Constraints," IEICE Trans. Information and Systems, vol. 88, no. 8, pp. 1815-1822, Aug. 2005.
[13] X.-D. Zhou, J.-L. Yu, C.-L. Liu, T. Nagasaki, and K. Marukawa, "Online Handwritten Japanese Character String Recognition Incorporating Geometric Context," Proc. Ninth Int'l Conf. Document Analysis and Recognition, pp. 48-52, Sept. 2007.
[14] X.-D. Zhou, C.-L. Liu, and M. Nakagawa, "Online Handwritten Japanese Character String Recognition Using Conditional Random Fields," Proc. 10th Int'l Conf. Document Analysis and Recognition, pp. 521-525, July 2009.
[15] B. Zhu, X.-D. Zhou, C.-L. Liu, and M. Nakagawa, "A Robust Model for On-Line Handwritten Japanese Text Recognition," Int'l J. Document Analysis and Recognition, vol. 13, no. 2, pp. 121-131, 2010.
[16] M. Cheriet, N. Kharma, C.-L. Liu, and C.Y. Suen, Character Recognition Systems: A Guide for Students and Practitioners. John Wiley & Sons, Inc., 2007.
[17] H. Murase, "Online Recognition of Free-Format Japanese Handwritings," Proc. Ninth Int'l Conf. Pattern Recognition, pp. 1143-1147, 1988.
[18] Y. Jiang, X. Ding, Q. Fu, and Z. Ren, "Context Driven Chinese String Segmentation and Recognition," Proc. Structural, Syntactic, and Statistical Pattern Recognition: Joint IAPR Int'l Workshops, pp. 127-135, 2006.
[19] X. Ding and H. Liu, "Segmentation-Driven Offline Handwritten Chinese and Arabic Script Recognition," Proc. Conf. Arabic and Chinese Handwriting, pp. 61-73, 2006.
[20] S. Senda and K. Yamada, "A Maximum-Likelihood Approach to Segmentation-Based Recognition of Unconstrained Handwriting Text," Proc. Sixth Int'l Conf. Document Analysis and Recognition, pp. 184-188, Sept. 2001.
[21] C.-L. Liu, F. Yin, D.-H. Wang, and Q.-F. Wang, "CASIA Online and Offline Chinese Handwriting Databases," Proc. 11th Int'l Conf. Document Analysis and Recognition, pp. 37-41, Sept. 2011.
[22] L.Y. Tseng and R.C. Chen, "Segmenting Handwritten Chinese Characters Based on Heuristic Merging of Stroke Bounding Boxes and Dynamic Programming," Pattern Recognition Letters, vol. 19, no. 10, pp. 963-973, Aug. 1998.
[23] Z. Liang and P. Shi, "A Metasynthetic Approach for Segmenting Handwritten Chinese Character Strings," Pattern Recognition Letters, vol. 26, no. 10, pp. 1498-1511, July 2005.
[24] C.-L. Liu, "Handwritten Chinese Character Recognition: Effects of Shape Normalization and Feature Extraction," Proc. Conf. Arabic and Chinese Handwriting Recognition, S. Jaeger and D. Doermann, eds., pp. 104-128, 2008.
[25] C.-L. Liu and H. Fujisawa, "Classification and Learning in Character Recognition: Advances and Remaining Problems," Machine Learning in Document Analysis and Recognition, S. Marinai and H. Fujisawa, eds., pp. 139-161, Springer, 2008.
[26] F. Kimura, K. Takashina, S. Tsuruoka, and Y. Miyake, "Modified Quadratic Discriminant Functions and the Application to Chinese Character Recognition," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 9, no. 1, pp. 149-153, Jan. 1987.
[27] C.-L. Liu and M. Nakagawa, "Evaluation of Prototype Learning Algorithms for Nearest Neighbor Classifier in Application to Handwritten Character Recognition," Pattern Recognition, vol. 34, no. 3, pp. 601-615, Mar. 2001.
[28] X. Lin, X. Ding, M. Chen, R. Zhang, and Y. Wu, "Adaptive Confidence Transform Based on Classifier Combination for Chinese Character Recognition," Pattern Recognition Letters, vol. 19, no. 10, pp. 975-988, Aug. 1998.
[29] C.-L. Liu, "Classifier Combination Based on Confidence Transformation," Pattern Recognition, vol. 38, no. 1, pp. 11-28, 2005.
[30] Y.X. Li, C.L. Tan, and X.Q. Ding, "A Hybrid Post-Processing System for Offline Handwritten Chinese Script Recognition," Pattern Analysis and Applications, vol. 8, no. 3, pp. 272-286, 2005.
[31] Q.-F. Wang, F. Yin, and C.-L. Liu, "Improving Handwritten Chinese Text Recognition by Confidence Transformation," Proc. 11th Int'l Conf. Document Analysis and Recognition, pp. 518-522, Sept. 2011.
[32] R. Rosenfeld, "Two Decades of Statistical Language Modeling: Where Do We Go from Here?" Proc. IEEE, vol. 88, no. 8, pp. 1270-1278, Aug. 2000.
[33] R.F. Xu, D.S. Yeung, and D.M. Shi, "A Hybrid Post-Processing System for Offline Handwritten Chinese Character Recognition Based on a Statistical Language Model," Int'l J. Pattern Recognition and Artificial Intelligence, vol. 19, no. 3, pp. 415-428, 2005.
[34] M. Koga, T. Kagehiro, H. Sako, and H. Fujisawa, "Segmentation of Japanese Handwritten Characters Using Peripheral Feature Analysis," Proc. 14th Int'l Conf. Pattern Recognition), vol. 2, pp. 1137-1141, 1998.
[35] F. Yin, Q.-F Wang, and C.-L. Liu, "Integrating Geometric Context for Text Alignment of Handwritten Chinese Documents," Proc. 12th Int'l Conf. Frontiers in Handwriting Recognition, pp. 7-12, Nov. 2010.
[36] S. Tulyakov and V. Govindaraju, "Probabilistic Model for Segmentation Based Word Recognition with Lexicon," Proc. Sixth Int'l Conf. Document Analysis and Recognition, pp. 164-167, Sept. 2001.
[37] C.-L. Liu, H. Sako, and H. Fujisawa, "Effects of Classifier Structures and Training Regimes on Integrated Segmentation and Recognition of Handwritten Numeral Strings," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 11, pp. 1395-1407, Nov. 2004.
[38] M. Wuthrich, M. Liwicki, A. Fischer, E. Indermuhle, H. Bunke, G. Viehhauser, and M. Stolz, "Language Model Integration for the Recognition of Handwritten Medieval Documents," Proc. 10th Int'l Conf. Document Analysis and Recognition, pp. 211-215, July 2009.
[39] S. Quiniou, M. Cheriet, and E. Anquetil, "Handling Out-of-Vocabulary Words and Recognition Errors Based on Word Linguistic Context for Handwritten Sentence Recognition," Proc. 10th Int'l Conf. Document Analysis and Recognition, pp. 466-470, July 2009.
[40] M.-Y. Chen, A. Kundu, and S.N. Srihari, "Variable Duration Hidden Markov Model and Morphological Segmentation for Handwritten Word Recognition," IEEE Trans. Image Processing, vol. 4, no. 12, pp. 1675-1688, Dec. 1995.
[41] B.-H. Juang, W. Chou, and C.-H. Lee, "Minimum Classification Error Rate Methods for Speech Recognition," IEEE Trans. Speech and Audio Processing, vol. 5, no. 3, pp. 257-265, May 1997.
[42] D. Povey, "Discriminative Training for Large Vocabulary Speech Recognition," PhD dissertation, Cambridge Univ., Cambridge, UK, 2003.
[43] X.-D. He, L. Deng, and W. Chou, "Discriminative Learning in Sequential Pattern Recognition," IEEE Signal Processing Magazine, vol. 25, no. 5, pp. 14-36, Sept. 2008.
[44] H. Ney and S. Ortmanns, "Progress in Dynamic Programming Search for LVCSR," Proc. IEEE, vol. 88, no. 8, pp. 1224-1240, Aug. 2000.
[45] C.-L. Liu and M. Nakagawa, "Precise Candidate Selection for Large Character Set Recognition by Confidence Evaluation," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 6, pp. 636-642, June 2000.
[46] E. Ishidera and A. Sato, "A Candidate Reduction Method for Handwritten Kanji Character Recognition," Proc. Sixth Int'l Conf. Document Analysis and Recognition, pp. 8-13, Sept. 2001.
[47] K. Kigo, "Improving Speed of Japanese OCR through Linguistic Preprocessing," Proc. Second Int'l Conf. Document Analysis and Recognition, pp. 214-217, 1993.
[48] Y.-X. Li, C.-L. Tan, X.-Q. Ding, and C.-S. Liu, "Contextual Post-Processing Based on the Confusion Matrix in Offline Handwritten Chinese Script Recognition," Pattern Recognition, vol. 37, no. 9, pp. 1901-1912, 2004.
[49] F. Yin, Q.-F. Wang, and C.-L. Liu, "A Tool for Ground-Truthing Text Lines and Characters in Off-Line Handwritten Chinese Documents," Proc. 10th Int'l Conf. Document Analysis and Recognition, pp. 951-955, July 2009.
[50] J.A. Barnett, "Computational Methods for a Mathematical Theory of Evidence," Proc. Seventh Int'l Joint Conf. Artificial Intelligence, pp. 868-875, 1981.
[51] A. Vinciarelli, S. Bengio, and H. Bunke, "Offline Recognition of Unconstrained Handwritten Texts Using HMMs and Statistical Language Models," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 6, pp. 709-720, June 2004.
[52] S. Martin, J. Liermann, and H. Ney, "Algorithms for Bigram and Trigram Word Clustering," Speech Comm., vol. 24, no. 1, pp. 19-37, 1998.
[53] A. Stolcke, "SRILM—An Extensible Language Modeling Toolkit," Proc. Seventh Int'l Conf. Spoken Language Processing, pp. 901-904, Sept. 2002.
[54] T.-H. Su, T.-W. Zhang, and D.-J. Guan, "Corpus-Based HIT-MW Database for Offline Recognition of General-Purpose Chinese Handwritten Text," Int'l J. Document Analysis and Recognition, vol. 10, no. 1, pp. 27-38, 2007.
[55] C.-L. Liu, "Normalization-Cooperated Gradient Feature Extraction for Handwritten Character Recognition," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 8, pp. 1465-1469, Aug. 2007.

Index Terms:
text analysis,Bayes methods,handwritten character recognition,learning (artificial intelligence),natural languages,pattern classification,probability,search problems,character-level correct rate,handwritten Chinese text offline recognition,integrated segmentation-and-recognition framework,character oversegmentation,path search,parameter estimation,multiple contexts,character recognition scores,geometric contexts,linguistic contexts,Bayesian decision,classifier,posterior probabilities,confidence transformation,beam search algorithm,search efficiency improvement,candidate character augmentation strategy,recognition accuracy improvement,supervised learning,path evaluation function optimization,maximum character accuracy criterion,recognition performance,Chinese handwriting database,CASIA-HWDB,unconstrained handwritten texts,text line recognition performance improvement,handwritten pages,character-level accurate rate,Character recognition,Text recognition,Context,Handwriting recognition,Hidden Markov models,Image segmentation,Lattices,maximum character accuracy training.,Handwritten Chinese text recognition,confidence transformation,geometric models,language models,refined beam search,candidate character augmentation
Citation:
Fei Yin, Qiu-Feng Wang, Cheng-Lin Liu, "Handwritten Chinese Text Recognition by Integrating Multiple Contexts," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 8, pp. 1469-1481, Aug. 2012, doi:10.1109/TPAMI.2011.264
Usage of this product signifies your acceptance of the Terms of Use.