This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Hybrid Contextural Text Recognition with String Matching
September 1993 (vol. 15 no. 9)
pp. 915-925

The hybrid contextural algorithm for reading real-life documents printed in varying fonts of any size is presented. Text is recognized progressively in three passes. The first pass is used to generate character hypothesis, the second to generate word hypothesis, and the third to verify the word hypothesis. During the first pass, isolated characters are recognized using a dynamic contour warping classifier. Transient statistical information is collected to accelerate the recognition process and to verify hypotheses in later processing. A transient dictionary consisting of high confidence nondictionary words is constructed in this pass. During the second pass, word-level hypotheses are generated using hybrid contextual text processing. Nondictionary words are recognized using a modified Viterbi algorithm, a string matching algorithm utilizing n grams, special handlers for touching characters, and pragmatic handlers for numerals, punctuation, hyphens, apostrophes, and a prefix/suffix handler. This processing usually generates several word hypothesis. During the third pass, word-level verification occurs.

[1] R. M. K. Sinha and B. Prasada, "Visual text recognition through contextual processing,"Patt. Recogn.vol. 21, no. 5, pp. 463-479, 1988.
[2] C. C. Tappert, "Cursive script recognition by elastic matching,"IBM J. Res. Dev., vol. 10, pp. 765-771, 1982.
[3] O. Owalabi and D. R. Mcgregor, "Fast approximate string matching software-practice and experience," vol. 18, no. 4, pp. 387-393, Apr. 1988.
[4] G. M. Landau, "Fast string matching withkdifferences,"J. Comput. Syst. Sci., vol. 37, pp. 63-78, Aug. 1988.
[5] P. A. V. Hall and G. R. Dowling, "Approximate string matching,"ACM Comput. Surveys, vol. 12, pp. 381-402, 1980.
[6] R. Wagner and M. Fischer, "The string-to-string correction problem,"J. ACM, vol. 21, pp. 168-173, 1974.
[7] J. J. Hull, S. N. Srihari, and R. Choudhari, "An integrated algorithm for text recognition: Comparison with a cascaded algorithm,"IEEE Patt. Anal. Machine Intell., vol. PAMI-5, no. 4, pp. 384-395, July 1983.
[8] D. J. Burr, "Elastic matching of line drawings,"IEEE Patt. Anal. Machine Intell., vol. PAMI-3, pp. 708-713, 1981.
[9] S. Kahan, T. Pavlidis, and H. S. Baird, "On the recognition of printed characters of any font and size,"IEEE Trans. Patt. Anal. Machine Intell., vol. PAMI-9, pp. 274-288, 1987.
[10] R. Shinghal, "An experimental investigation of four text recognition algorithms,"IEEE Trans. Syst. Man Cybern.vol. SMC-12, no. 4, pp. 573-577, July 1982.
[11] R. Shinghal, "A hybrid algorithm for contextual text recognition pattern recognition,"Patt. Recogn., vol. 16, no. 2, pp. 261-267, 1983.
[12] J. J. Hull and S. N. Srihari, "Experiments in text recognition with binaryn-gram and Viterbi algorithms,"IEEE Patt. Anal. Machine Intell., vol. PAMI-4, no. 5, pp. 520-530, Sept. 1982.
[13] J. L. Peterson, "Computer program for detecting and correcting spelling errors,"Commun. ACM, vol. 23, no. 12, pp. 676-687, Dec. 1980.
[14] J. L. Peterson, "A note on undetected typing errors,"Commun. ACM, vol. 29, no. 7, pp. 633-637, July 1986.
[15] W. Doster and J. Schurman, "An application of the modified Viterbi algorithm used in text recognition,"Proc. 5th Int. Conf. Patt. Recogn., pp. 853-855, 1980.
[16] A. Goshtasby and R. W. Ehrich, "Contextual word recognition using probabislit relaxation labeling,"Patt. Recogn., vol. 21, no. 5, pp. 455-462, 1988.
[17] R. Shingal and G. T. Toussaint, "A bottom-up and top-down approach to using context in text recognition,"Int. J. Man-Machine Studies, vol. 11, pp. 201-212, 1979.
[18] D. E. Knuth,The Art of Computer Programming. Reading, MA: Addison-Wesley, 1973.
[19] P. K. Duffie, "Contour elastic matching for omni-font character recognition," M. S. thesis, Dept. Elect. Eng., McGill Univ., 1985.
[20] S. Harmalkar and R. M. K. Sinha, "Integrating word level information in text recognition," inProc. 10th Int. Conf. Patt. Recogn., 1990, pp. 758-60.
[21] R. M. K. Sinha, "On partitioning dictionary for visual text recognition,"Patt. Recogn., vol. 23, no. 5, pp. 497-500, 1990.
[22] P. K. Duffle, "contour elastic matching for omni-font character recognition," BNR TR 85-0105, 1985.

Index Terms:
progressive recognition; transient statistical information; hypothesis verification; text recognition; string matching; hybrid contextural algorithm; real-life documents; character hypothesis; word hypothesis; dynamic contour warping classifier; transient dictionary; modified Viterbi algorithm; document image processing; optical character recognition
Citation:
R.M.K. Sinha, B. Prasada, G.F. Houle, M. Sabourin, "Hybrid Contextural Text Recognition with String Matching," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 15, no. 9, pp. 915-925, Sept. 1993, doi:10.1109/34.232077
Usage of this product signifies your acceptance of the Terms of Use.