CSDL Home IEEE Transactions on Pattern Analysis & Machine Intelligence 1999 vol.21 Issue No.12 - December
Issue No.12 - December (1999 vol.21)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/34.817408
<p><b>Abstract</b>—To maintain OCR accuracy with decreasing quality of page image composition, production, and digitization, it is essential to tune the system to each document. We propose a prototype extraction method for document-specific OCR systems. The method automatically generates training samples from unsegmented text images and the corresponding transcripts. It is tolerant of transcription errors, so a transcript produced automatically by an imperfect omnifont OCR system can be used. The method is based on new algorithms for estimating character widths, character locations in a word, and match/nonmatch probabilities from unsegmented text. An experimental word recognition system is designed and developed to combine prototype extraction algorithms and segmentation-free word recognition. The system can adapt itself to different page images and achieve high recognition accuracy on heavily degraded print.</p>
Optical character recognition, adaptive classification, template matching, segmentation, document image analysis, text reader.
Yihong Xu, "Prototype Extraction and Adaptive OCR", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.21, no. 12, pp. 1280-1296, December 1999, doi:10.1109/34.817408