This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Lexicon-Driven Segmentation and Recognition of Handwritten Character Strings for Japanese Address Reading
November 2002 (vol. 24 no. 11)
pp. 1425-1437

Abstract—This paper describes a handwritten character string recognition system for Japanese mail address reading on very large vocabulary. The address phrases are recognized as a whole because there is no extra space between words. The lexicon contains 111,349 address phrases, which are stored in a trie structure. In recognition, the text line image is matched with the lexicon entries (phrases) to obtain reliable segmentation and retrieve valid address phrases. In this paper, we first introduce some effective techniques for text line image preprocessing and presegmentation. In presegmentation, the text line image is separated into primitive segments by connected component analysis and touching pattern splitting based on contour shape analysis. In lexicon matching, consecutive segments are dynamically combined into candidate character patterns. An accurate character classifier is embedded in lexicon matching to select characters matched with a candidate pattern from a dynamic category set. A beam search strategy is used to control the lexicon matching so as to achieve real-time recognition. In experiments on 3,589 live mail images, the proposed method achieved correct rate of 83.68 percent while the error rate is less than 1 percent.

[1] S.N. Srihari, “High-Performance Reading Machines,” Proc. IEEE, vol. 80, no. 7, pp. 1120-1132, 1992.
[2] S.N. Srihari, Y.-C. Shin, V. Ramanaprasad, and D.-S. Lee, “Name and Address Block Reader System for Tax form Processing,” Proc. Third Int'l Conf. Document Analysis and Recognition, pp. 5-10, 1995.
[3] A. Filatov, V. Nikitin, A. Volgunin, and P. Zelinsky, “The AddressScript Recognition System for Handwritten Envelops,” Document Analysis Systems: Theory and Practice, S.-W. Lee and Y. Nakano, eds, pp. 157-171, Springer, 1999.
[4] H. Murase, “Segmentation and Recognition of Hand-Written Character String Using Linguistic Information,” Trans. IEICE, vol. J69, no. 9, pp. 1292-1301, 1986.
[5] M. Koga, R. Mine, H. Sako, and H. Fujisawa, “Lexical Search Approach for Character-String Recognition,” Document Analysis Systems: Theory and Practice, S.-W. Lee and Y. Naka no eds., pp. 115-129, Springer, 1999.
[6] R. Bozinovic and S.N. Srihari, “Off-Line Cursive Script Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 11, no. 1, pp. 68-83, 1989.
[7] F. Kimura, M. Sridhar, and Z. Chen, “Improvements of a Lexicon Directed Algorithm for Recognition of Unconstrained Handwritten Words,” Proc. Second Int'l Conf. Document Analysis and Recognition, pp. 18-22, 1993.
[8] M. Mohammed and P. Gader, “Handwritten Word Recognition Using Segmentation-Free Hidden Markov Modeling and Segmentation-Based Dynamic Programming Techniques,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, no. 5, pp. 548-554, May 1996.
[9] G. Kim and V. Govindaraju, “A Lexicon Driven Approach to Handwritten Word Recognition for Real Time Applications,“ IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 4, pp. 366-379, Apr. 1997.
[10] Y. Lu and M. Sridhar, “Character Segmentation in Handwritten Words: An Overview,” Pattern Recognition, vol. 29, no. 1, pp. 77-96, 1996.
[11] R. Casey and E. Lecolinet, “A Survey of Methods in Strategies in Character Segmentation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, pp. 690-706, 1996.
[12] K. Marukawa, M. Koga, Y. Shima, and H. Fujisawa, “An Error Correction Algorithm for Handwritten Chinese Character Address Recognition,” Proc. First Int'l Conf. Document Analysis and Recognition, pp. 916-924, 1991.
[13] C.-H. Chang, “Correcting Character Segmentation Errors for Chinese Character Recognition,” Proc. Int'l Conf. Computer Processing of Oriental Languages, pp. 273-276, 1997.
[14] P.-K. Wong and C. Chan, “Postprocessing Statistical Language Models for a Handwritten Chinese Character Recognizer,” IEEE Trans. System, Man, and Cybernetics, Part B: Cybernetics, vol. 29, no. 2, pp. 286-290, 1999.
[15] H. Murase, “Online Recognition of Free-Format Japanese Handwritings,” Proc. Ninth Int'l Conf. Pattern Recognition (ICPR), pp. 1143-1147, 1988.
[16] H. Fujisawa, Y. Nakano, and K. Kurino, “Segmentation Methods for Character Recognition: From Segmentation to Document Structure Analysis,” Proc. IEEE, vol. 80, no. 7, pp. 1079-1092, 1992.
[17] E. Fredkin, “Trie Memory,” Comm. ACM, vol. 3, no. 9, pp. 490-500, 1960.
[18] A. Dengal et al., “Techniques for Improving OCR Results,” Handbook of Character Recognition and Document Image Analysis, H. Bunke and P.S.P. Wang eds., pp. 227-254, World Scientific 1997.
[19] N.W. Strathy, C.Y. Suen, and A. Kryzyzak, ”Segmentation of Handwritten Digits Using Contour Features,” Proc. Second Int'l Conf. Document Analysis and Recognition, pp. 577-580, 1993.
[20] H. Ikeda et al., “A Recognition Method for Touching Japanese Handwritten Characters,” Proc. Fifth Int'l Conf. Document Analysis and Recognition (ICDAR), pp. 641-644, 1999.
[21] J. Gao, X. Ding, and Y. Wu, “A Segmentation Algorithm for Handwritten Chinese Character Recognition,” Proc. Fifth Int'l Conf. Document Analysis and Recognition, pp. 633-636, 1999.
[22] E. Ishidera, D. Nishiwaki, and K. Yamada, “Unconstrained Japanese Address Recognition Using a Combination of Spatial Information and Word Knowledge,” Proc. Fourth Int'l Conf. Document Analysis and Recognition (ICDAR), pp. 1016-1022, 1997.
[23] Y. Maeda, F. Yoda, K. Matsuura, and H. Nambu, “Character Segmentation in Japanese Hand-Printed Document Images,” Proc. Eighth Int'l Conf. Pattern Recognition (ICPR), pp. 769-772, 1986.
[24] L.Y. Tseng and R.C. Chen, “Segmenting Handwritten Chinese Characters Based on Heuristic Merging of Stroke Bounding Boxes and Dynamic Programming,” Pattern Recognition Letters, vol. 19, no. 10, pp. 963-973, 1998.
[25] Y.-H. Tseng and H.-J. Lee, “Recognition-Based Handwritten Chinese Character Segmentation Using a Probabilistic Viterbi Algorithm,” Pattern Recognition Letters, vol. 20, no. 8, pp. 791-806, 1999.
[26] Y. Kobayashi, K. Yamada, and J. Tsukumo, A Segmentation Method for Handwritten Japanese Lines Based on Transitional Information, Proc. 11th Int'l Conf. Pattern Recognition (ICPR), pp. 487-491, 1992.
[27] M. Koga, T. Kagehiro, H. Sako, and H. Fujisawa, “Segmentation of Japanese Handwritten Characters Using Peripheral Feature Analysis,” Proc. 14th Int'l Conf. Pattern Recognition, vol. 2, pp. 1137-1141, 1998.
[28] S.H. Kim, S. Jeong, and C.Y. Suen, “A Lexicon-Driven Approach for Optimal Segment Combination in Off-Line Recognition of Unconstrained Handwritten Korean Words,” Pattern Recognition, vol. 34, pp. 1437-1447, 2001.
[29] J.T. Favata, “Offline General Word Handwritten Word Recognition Using an Approximate BEAM Matching Algorithm,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 9, pp. 1009-1021, 2001.
[30] C.-H. Chen, “Lexicon-Driven Word Recognition,” Proc. Third Int'l Conf. Document Analysis and Recognition, pp. 919-922, 1995.
[31] D.Y. Chen, J. Mao, and K. Mohiuddin, “An Efficient Algorithm for Matching a Lexicon with a Segmentation Graph,” Proc. Fifth Int'l Conf. Document Analysis and Recognition, pp. 543-546, 1999.
[32] E.H. Ratzlaff, K.S. Nathan, and H. Maruyama, “Search Issues in IBM Large Vocabulary Unconstrained Handwriting Recognizer,” Proc. Fifth Int'l Workshop Frontiers in Handwriting Recognition (IWFHR), pp. 177-182, 1996.
[33] S. Manke, M. Finke, and A. Waibel, “A Fast Search Technique for Large Vocabulary On-Line Handwriting Recognition,” Proc. Fifth Int'l Workshop Frontiers in Handwriting Recognition, pp. 183-188, 1996.
[34] G. Seni and J. Seybold, “Diacritical Processing Using Efficient Accounting Procedures in Forward Search,” Advances in Handwriting Recognition, S.-W. Lee ed., pp. 49-58, World Scientific, 1999.
[35] R.M. Haralick,S.R. Sternberg,, and X. Zhuang,“Image analysis using mathematical morphology,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 9, no. 4, pp. 532-550, July 1988.
[36] C.-L. Liu and M. Nakagawa, “Evaluation of Prototype Learning Algorithms for Nearest Neighbor Classifier in Application to Handwritten Character Recognition,” Pattern Recognition, vol. 34, no. 3, pp. 601-615, 2001.
[37] T. Kohonen, “The Self-Organizing Map,” Proc. IEEE, vol. 78, no. 9, pp. 1464-1480, Sept. 1990.
[38] A. Rosenfeld and E. Johnston, “Angle Detection on Digital Curves,” IEEE Trans. Computers, vol. 22, pp. 875-878, 1973.
[39] N.J. Nilsson, Principles of Artificial Intelligence. Morgan Kaufmann, 1980.
[40] D.H. Lee and S.M. Reddy, On Determining Scan Flip-Flops in Partial Scan Designs Proc. IEEE/ACM Int'l Conf. Computer-Aided Design, pp. 322-325, 1990.
[41] H. Ney, “A Comparative Study of Two Search Strategies for Connected Word Recognition: Dynamic Programming and Heuristic Search,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 14, no. 5, pp. 586-595, 1992.
[42] B.M. Roper, "Training First Year Undergraduates to Produce Quality Software," University Computing, Vol. 10, 1988, pp. 9-12.
[43] R.E. Bellman Dynamic Programming. Princeton, N.J.: Princeton Univ. Press, 1957.
[44] F. Kimura, K. Takashina, S. Tsuruoka, and Y. Miyake, “Modified Quadratic Discriminant Functions and the Application to Chinese Character Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 9, no. 1, pp. 149-153, 1987.
[45] M. Hamanaka, K. Yamada, and J. Tsukumo, “Normalization-Cooperated Feature Extraction Method for Handprinted Kanji Character Recognition,” Proc. Third Int'l Workshop Frontiers in Handwriting Recognition (IWFHR), pp. 343-348, 1993.
[46] H. Yamada, K. Yamamoto, and T. Saito, “A Nonlinear Normalization Method for Handprinted Kanji Character Recognition—lLine Density Equalization,” Pattern Recognition, vol. 23, no. 9, pp. 1023-1029, 1990.
[47] J. Tsukumo and H. Tanaka, “Classification of Handprinted Chinese Characters Using Nonlinear Normalization and Correlation Methods,” Proc. Ninth Int'l Conf. Pattern Recognition, pp. 168-171, 1988.
[48] C.-L. Liu, Y.-J. Liu, and R-W. Dai, “Preprocessing and Statistical/Structural Feature Extraction for Handwritten Numeral Recognition,” Progress of Handwriting Recognition, A.C. Downton and S. Impedovo eds., pp. 161-168, World Scientific, 1997.
[49] S. Katagiri, C.-H. Lee, and B.-H. Juang, “New Discriminative Training Algorithms for Minimum Error Classification,” Proc. IEEE Workshop Neural Networks for Signal Processing, pp. 299-308, 1991.
[50] B. H. Juang,S. Katagiri,“Discriminative Learning for Minimum Error Classification,” IEEE Trans. on Signal Processing, vol. 40, pp. 3043-3054, Dec. 1992.
[51] H. Robbins and S. Monro, “A Stochastic Approximation Method,” Ann. Math. Statistics, vol. 22, pp. 400-407, 1951.
[52] C.-L. Liu and M. Nakagawa, “Precise Candidate Selection for Large Character Set Recognition by Confidence Evaluation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 6, pp. 636-642, 2000.
[53] A. Lifchitz and F. Maire, “A Fast Lexically Constrained Viterbi Algorithm for On-Line Handwriting Recognition,” Proc. Seventh Int'l Workshop Frontiers in Handwriting Recognition, pp. 313-322, 2000.

Index Terms:
Mail address reading, handwritten character string recognition, touching character splitting, character classification, lexicon matching, beam search.
Citation:
Cheng-Lin Liu, Masashi Koga, Hiromichi Fujisawa, "Lexicon-Driven Segmentation and Recognition of Handwritten Character Strings for Japanese Address Reading," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 11, pp. 1425-1437, Nov. 2002, doi:10.1109/TPAMI.2002.1046151
Usage of this product signifies your acceptance of the Terms of Use.