The Community for Technology Leaders
Green Image
Issue No. 12 - December (1999 vol. 21)
ISSN: 0162-8828
pp: 1280-1296
ABSTRACT
<p><b>Abstract</b>—To maintain OCR accuracy with decreasing quality of page image composition, production, and digitization, it is essential to tune the system to each document. We propose a prototype extraction method for document-specific OCR systems. The method automatically generates training samples from unsegmented text images and the corresponding transcripts. It is tolerant of transcription errors, so a transcript produced automatically by an imperfect omnifont OCR system can be used. The method is based on new algorithms for estimating character widths, character locations in a word, and match/nonmatch probabilities from unsegmented text. An experimental word recognition system is designed and developed to combine prototype extraction algorithms and segmentation-free word recognition. The system can adapt itself to different page images and achieve high recognition accuracy on heavily degraded print.</p>
INDEX TERMS
Optical character recognition, adaptive classification, template matching, segmentation, document image analysis, text reader.
CITATION

G. Nagy and Y. Xu, "Prototype Extraction and Adaptive OCR," in IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 21, no. , pp. 1280-1296, 1999.
doi:10.1109/34.817408
84 ms
(Ver 3.3 (11022016))