This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Ninth International Workshop on Frontiers in Handwriting Recognition (IWFHR'04)
An Empirical Study of Statistical Language Models for Contextual Post-Processing of Chinese Script Recognition
Kokubunji, Tokyo, Japan
October 26-October 29
ISBN: 0-7695-2187-8
Yuan-Xiang Li, National University of Singapore
Chew Lim Tan, National University of Singapore
It is crucial to use statistical language models (LMs) to improve the accuracy of Chinese offline script recognition. In this paper, we investigate the influence of several LMs on the contextual post-processing performance of Chinese script recognition. We first introduce seven LMs, i.e., three conventional LMs (character-based bigram, character-based trigram, word-based bigram), two class-based bigram LMs and two hybrid bigram LMs combining word-based bigrams and class-based bi-grams. We then investigate how the LMs? perplexities are affected by training corpus size, smoothing methods and count cutoffs. Next, we demonstrate the above LMs? influence on the post-processing performance in terms of recognition accuracy, memory requirement and processing speed. Finally, we give a proposal to select a suitable LM in real recognition tasks.
Citation:
Yuan-Xiang Li, Chew Lim Tan, "An Empirical Study of Statistical Language Models for Contextual Post-Processing of Chinese Script Recognition," iwfhr, pp.257-262, Ninth International Workshop on Frontiers in Handwriting Recognition (IWFHR'04), 2004
Usage of this product signifies your acceptance of the Terms of Use.