Language Engineering Conference (LEC'02)
A Multi-Font OCR System for Printed Telugu Text
Hyderabad, India
December 13-December 15
ISBN: 0-7695-1885-0
This work describes the design and development of a Telugu Optical Character Recognition system for printed text (TOSP). Pre-processing tasks considered in this paper are: Conversion of a grey scale image to a binary image, image rectification, skew detection and removal, segmentation of text into lines, words and basic symbols. Basic symbols are identified as the fundamental unit of segmentation in this paper which are recognized by the recognizer. The combinations of these basic symbols that together form characters and compound characters of Telugu are also determined to complete the recognition process. The special feature of TOSP is that it is designed to handle multiple sizes and multiple fonts. Further, the output produced by TOSP can directly be opened in any Indian language software that supports transliteration facility into Telugu script and edited. Several such softwares are popular and available.
Citation:
C. Vasantha Lakshmi, C. Patvardhan, "A Multi-Font OCR System for Printed Telugu Text," lec, pp.7, Language Engineering Conference (LEC'02), 2002
Usage of this product signifies your acceptance of the
Terms of Use.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||