7th IEEE International Conference on Computer and Information Technology (CIT 2007) Identification of Japanese and English Script from a Single Document Page Aizu-Wakamatsu City, Fukushima, Japan October 16-October 19 ISBN: 0-7695-2983-6
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/CIT.2007.109
In Japanese documents, a single text line of a page may contain both Japanese and English scripts. For the Optical Character Recognition of such a document page it is better to identify Japanese and English script portions at first, and then to use individual OCRs of these two scripts on their respective identified portions to get higher OCR accuracy. In this paper, an automatic technique for identification of Japanese and English script portions from a single line of a printed document page is proposed. To the best of our knowledge this is the first work of its kind. Here, at first, the document is segmented into lines and then lines are segmented into characters. In the proposed scheme, individual scripts are identified using combination of different features obtained from structural shape of characters, pitch information, topological properties, water reservoir concept etc. Based on the experiment on 11304 characters, we obtained 98.79% identification accuracy from the proposed scheme.
Citation:
S. Chanda, U. Pal, F. Kimura, "Identification of Japanese and English Script from a Single Document Page," cit, pp.656-661, 7th IEEE International Conference on Computer and Information Technology (CIT 2007), 2007 Usage of this product signifies your acceptance of the Terms of Use. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||