loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
7th IEEE International Conference on Computer and Information Technology (CIT 2007)
Identification of Japanese and English Script from a Single Document Page
Aizu-Wakamatsu City, Fukushima, Japan
October 16-October 19
ISBN: 0-7695-2983-6
S. Chanda, Indian Statistical Institute, Kolkata-108, India
U. Pal, Indian Statistical Institute, Kolkata-108, India
F. Kimura, Mie University, 1577 Kurima-machiya, Tsu, 514-8507, Japan
In Japanese documents, a single text line of a page may contain both Japanese and English scripts. For the Optical Character Recognition of such a document page it is better to identify Japanese and English script portions at first, and then to use individual OCRs of these two scripts on their respective identified portions to get higher OCR accuracy. In this paper, an automatic technique for identification of Japanese and English script portions from a single line of a printed document page is proposed. To the best of our knowledge this is the first work of its kind. Here, at first, the document is segmented into lines and then lines are segmented into characters. In the proposed scheme, individual scripts are identified using combination of different features obtained from structural shape of characters, pitch information, topological properties, water reservoir concept etc. Based on the experiment on 11304 characters, we obtained 98.79% identification accuracy from the proposed scheme.
Citation:
S. Chanda, U. Pal, F. Kimura, "Identification of Japanese and English Script from a Single Document Page," cit, pp.656-661, 7th IEEE International Conference on Computer and Information Technology (CIT 2007), 2007
Usage of this product signifies your acceptance of the Terms of Use.