The Community for Technology Leaders
Ninth International Workshop on Frontiers in Handwriting Recognition (2004)
Kokubunji, Tokyo, Japan
Oct. 26, 2004 to Oct. 29, 2004
ISSN: 1550-5235
ISBN: 0-7695-2187-8
pp: 257-262
Chew Lim Tan , National University of Singapore
Yuan-Xiang Li , National University of Singapore
It is crucial to use statistical language models (LMs) to improve the accuracy of Chinese offline script recognition. In this paper, we investigate the influence of several LMs on the contextual post-processing performance of Chinese script recognition. We first introduce seven LMs, i.e., three conventional LMs (character-based bigram, character-based trigram, word-based bigram), two class-based bigram LMs and two hybrid bigram LMs combining word-based bigrams and class-based bi-grams. We then investigate how the LMs? perplexities are affected by training corpus size, smoothing methods and count cutoffs. Next, we demonstrate the above LMs? influence on the post-processing performance in terms of recognition accuracy, memory requirement and processing speed. Finally, we give a proposal to select a suitable LM in real recognition tasks.
Chew Lim Tan, Yuan-Xiang Li, "An Empirical Study of Statistical Language Models for Contextual Post-Processing of Chinese Script Recognition", Ninth International Workshop on Frontiers in Handwriting Recognition, vol. 00, no. , pp. 257-262, 2004, doi:10.1109/IWFHR.2004.15
103 ms
(Ver )