The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.09 - September (1993 vol.15)
pp: 915-925
ABSTRACT
<p>The hybrid contextural algorithm for reading real-life documents printed in varying fonts of any size is presented. Text is recognized progressively in three passes. The first pass is used to generate character hypothesis, the second to generate word hypothesis, and the third to verify the word hypothesis. During the first pass, isolated characters are recognized using a dynamic contour warping classifier. Transient statistical information is collected to accelerate the recognition process and to verify hypotheses in later processing. A transient dictionary consisting of high confidence nondictionary words is constructed in this pass. During the second pass, word-level hypotheses are generated using hybrid contextual text processing. Nondictionary words are recognized using a modified Viterbi algorithm, a string matching algorithm utilizing n grams, special handlers for touching characters, and pragmatic handlers for numerals, punctuation, hyphens, apostrophes, and a prefix/suffix handler. This processing usually generates several word hypothesis. During the third pass, word-level verification occurs.</p>
INDEX TERMS
progressive recognition; transient statistical information; hypothesis verification; text recognition; string matching; hybrid contextural algorithm; real-life documents; character hypothesis; word hypothesis; dynamic contour warping classifier; transient dictionary; modified Viterbi algorithm; document image processing; optical character recognition
CITATION
B. Prasada, G.F. Houle, M. Sabourin, "Hybrid Contextural Text Recognition with String Matching", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.15, no. 9, pp. 915-925, September 1993, doi:10.1109/34.232077
31 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool