Issue No. 12 - December (1999 vol. 21)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/34.817408
<p><b>Abstract</b>—To maintain OCR accuracy with decreasing quality of page image composition, production, and digitization, it is essential to tune the system to each document. We propose a prototype extraction method for document-specific OCR systems. The method automatically generates training samples from unsegmented text images and the corresponding transcripts. It is tolerant of transcription errors, so a transcript produced automatically by an imperfect omnifont OCR system can be used. The method is based on new algorithms for estimating character widths, character locations in a word, and match/nonmatch probabilities from unsegmented text. An experimental word recognition system is designed and developed to combine prototype extraction algorithms and segmentation-free word recognition. The system can adapt itself to different page images and achieve high recognition accuracy on heavily degraded print.</p>
Optical character recognition, adaptive classification, template matching, segmentation, document image analysis, text reader.
G. Nagy and Y. Xu, "Prototype Extraction and Adaptive OCR," in IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 21, no. , pp. 1280-1296, 1999.