loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Eighth International Conference on Document Analysis and Recognition (ICDAR'05)
Language Identification of Character Images Using Machine Learning Techniques
Seoul, Korea
August 31-September 01
ISBN: 0-7695-2420-6
Ying-Ho Liu, Institute of Information Science, Academia Sinica, Taipei, Taiwan
Fu Chang, Institute of Information Science, Academia Sinica, Taipei, Taiwan
Chin-Chin Lin, National Taipei University of Technology, Taipei, Taiwan
In this paper, we propose a new approach for identifying the language type of character images. We do this by classifying individual character images to determine the language boundaries in multilingual documents. Two effective methods are considered for this purpose: the prototype classification method and support vector machines (SVM). Due to the large size of our training dataset, we further propose a technique to speed up the training process for both methods. Applying the two methods to classifying characters into Chinese, English, and Japanese (including Hiragana and Katakana) has produced very accurate and comparable test results.
Citation:
Ying-Ho Liu, Fu Chang, Chin-Chin Lin, "Language Identification of Character Images Using Machine Learning Techniques," icdar, pp.630-634, Eighth International Conference on Document Analysis and Recognition (ICDAR'05), 2005
Usage of this product signifies your acceptance of the Terms of Use.