The Community for Technology Leaders
Green Image
<p>The hybrid contextural algorithm for reading real-life documents printed in varying fonts of any size is presented. Text is recognized progressively in three passes. The first pass is used to generate character hypothesis, the second to generate word hypothesis, and the third to verify the word hypothesis. During the first pass, isolated characters are recognized using a dynamic contour warping classifier. Transient statistical information is collected to accelerate the recognition process and to verify hypotheses in later processing. A transient dictionary consisting of high confidence nondictionary words is constructed in this pass. During the second pass, word-level hypotheses are generated using hybrid contextual text processing. Nondictionary words are recognized using a modified Viterbi algorithm, a string matching algorithm utilizing n grams, special handlers for touching characters, and pragmatic handlers for numerals, punctuation, hyphens, apostrophes, and a prefix/suffix handler. This processing usually generates several word hypothesis. During the third pass, word-level verification occurs.</p>
progressive recognition; transient statistical information; hypothesis verification; text recognition; string matching; hybrid contextural algorithm; real-life documents; character hypothesis; word hypothesis; dynamic contour warping classifier; transient dictionary; modified Viterbi algorithm; document image processing; optical character recognition
B. Prasada, M. Sabourin, R.M.K. Sinha, G.F. Houle, "Hybrid Contextural Text Recognition with String Matching", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 15, no. , pp. 915-925, September 1993, doi:10.1109/34.232077
48 ms
(Ver 3.3 (11022016))