The Community for Technology Leaders
RSS Icon
Subscribe
Kokubunji, Tokyo, Japan
Oct. 26, 2004 to Oct. 29, 2004
ISBN: 0-7695-2187-8
pp: 257-262
Yuan-Xiang Li , National University of Singapore
Chew Lim Tan , National University of Singapore
ABSTRACT
It is crucial to use statistical language models (LMs) to improve the accuracy of Chinese offline script recognition. In this paper, we investigate the influence of several LMs on the contextual post-processing performance of Chinese script recognition. We first introduce seven LMs, i.e., three conventional LMs (character-based bigram, character-based trigram, word-based bigram), two class-based bigram LMs and two hybrid bigram LMs combining word-based bigrams and class-based bi-grams. We then investigate how the LMs? perplexities are affected by training corpus size, smoothing methods and count cutoffs. Next, we demonstrate the above LMs? influence on the post-processing performance in terms of recognition accuracy, memory requirement and processing speed. Finally, we give a proposal to select a suitable LM in real recognition tasks.
INDEX TERMS
null
CITATION
Yuan-Xiang Li, Chew Lim Tan, "An Empirical Study of Statistical Language Models for Contextual Post-Processing of Chinese Script Recognition", IWFHR, 2004, Proceedings. Ninth International Workshop on Frontiers in Handwriting Recognition, Proceedings. Ninth International Workshop on Frontiers in Handwriting Recognition 2004, pp. 257-262, doi:10.1109/IWFHR.2004.15
17 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool