This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Scene Text Recognition Using Similarity and a Lexicon with Sparse Belief Propagation
October 2009 (vol. 31 no. 10)
pp. 1733-1746
Jerod J. Weinman, Grinnell College, Grinnell
Erik Learned-Miller, University of Massachusetts Amherst, Amherst
Allen R. Hanson, University of Massachusetts Amherst, Amherst
Scene text recognition (STR) is the recognition of text anywhere in the environment, such as signs and storefronts. Relative to document recognition, it is challenging because of font variability, minimal language context, and uncontrolled conditions. Much information available to solve this problem is frequently ignored or used sequentially. Similarity between character images is often overlooked as useful information. Because of language priors, a recognizer may assign different labels to identical characters. Directly comparing characters to each other, rather than only a model, helps ensure that similar instances receive the same label. Lexicons improve recognition accuracy but are used post hoc. We introduce a probabilistic model for STR that integrates similarity, language properties, and lexical decision. Inference is accelerated with sparse belief propagation, a bottom-up method for shortening messages by reducing the dependency between weakly supported hypotheses. By fusing information sources in one model, we eliminate unrecoverable errors that result from sequential processing, improving accuracy. In experimental results recognizing text from images of signs in outdoor scenes, incorporating similarity reduces character recognition error by 19 percent, the lexicon reduces word recognition error by 35 percent, and sparse belief propagation reduces the lexicon words considered by 99.9 percent with a 12X speedup and no loss in accuracy.

[1] J.J. Weinman and E. Learned-Miller, “Improving Recognition of Novel Input with Similarity,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, pp. 308-315, June 2006.
[2] J.J. Weinman, E. Learned-Miller, and A. Hanson, “Fast Lexicon-Based Scene Text Recognition with Sparse Belief Propagation,” Proc. Int'l Conf. Document Analysis and Recognition, pp. 979-983, Sept. 2007.
[3] T. Hong and J.J. Hull, “Improving OCR Performance with Word Image Equivalence,” Proc. Symp. Document Analysis and Information Retrieval, pp. 177-190, 1995.
[4] T.M. Breuel, “Classification by Probabilistic Clustering,” Proc. Int'l Conf. Acoustics, Speech, and Signal Processing, vol. 2, pp. 1333-1336, 2001.
[5] T.M. Breuel, “Character Recognition by Adaptive Statistical Similarity,” Proc. Int'l Conf. Document Analysis and Recognition, vol. 1, pp. 158-162, 2003.
[6] J.D. Hobby and T.K. Ho, “Enhancing Degraded Document Images via Bitmap Clustering and Averaging,” Proc. Int'l Conf. Document Analysis and Recognition, vol. 1, pp. 394-400, 1997.
[7] S. Kumar and M. Hebert, “Discriminative Random Fields,” Int'l J. Computer Vision, vol. 68, no. 2, pp. 179-201, 2006.
[8] K. Rayner and A. Pollatsek, The Psychology of Reading. Prentice-Hall, 1989.
[9] W.W. Bledsoe and I. Browning, “Pattern Recognition and Reading by Machine,” Proc. Eastern Joint Computer Conf., pp. 225-232, 1959.
[10] C. Jacobs, P.Y. Simard, P. Viola, and J. Rinker, “Text Recognition of Low-Resolution Document Images,” Proc. Int'l Conf. Document Analysis and Recognition, pp. 695-699, 2005.
[11] C. Thillou, S. Ferreira, and B. Gosselin, “An Embedded Application for Degraded Text Recognition,” Eurasip J. Applied Signal Processing, vol. 13, pp. 2127-2135, 2005.
[12] R. Beaufort and C. Mancas-Thillou, “A Weighted Finite-State Framework for Correcting Errors in Natural Scene OCR,” Proc. Int'l Conf. Document Analysis and Recognition, vol. 2, pp. 889-893, 2007.
[13] D. Zhang and S.-F. Chang, “A Bayesian Framework for Fusing Multiple Word Knowledge Models in Videotext Recognition,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, vol. 2, pp. 528-533, 2003.
[14] S.M. Lucas, G. Patoulas, and A.C. Downton, “Fast Lexicon-Based Word Recognition in Noisy Index Card Images,” Proc. Int'l Conf. Document Analysis and Recognition, vol. 1, pp. 462-466, 2003.
[15] M.-P. Schambach, “Fast Script Word Recognition with Very Large Vocabulary,” Proc. Int'l Conf. Document Analysis and Recognition, pp. 9-13, 2005.
[16] C. Pal, C. Sutton, and A. McCallum, “Sparse Forward-Backward Using Minimum Divergence Beams for Fast Training of Conditional Random Fields,” Proc. Int'l Conf. Acoustics, Speech, and Signal Processing, vol. 5, pp. 581-584, 2006.
[17] J. Coughlan and H. Shen, “Dynamic Quantization for Belief Propagation in Sparse Spaces,” Computer Vision and Image Understanding, vol. 106, no. 1, pp. 47-58, Apr. 2007.
[18] J. Lafferty, A. McCallum, and F. Pereira, “Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data,” Proc. Int'l Conf. Machine Learning, pp. 282-289, 2001.
[19] F.R. Kschischang, B.J. Frey, and H.-A. Loeliger, “Factor Graphs and the Sum-Product Algorithm,” IEEE Trans. Information Theory, vol. 47, no. 2, pp. 498-519, Feb. 2001.
[20] C. Sutton and A. McCallum, “Piecewise Training of Undirected Models,” Proc. Conf. Uncertainty in Artificial Intelligence, pp. 568-575, 2005.
[21] W. Niblack, An Introduction to Digital Image Processing. Prentice-Hall, 1986.
[22] J.G. Daugman, “Complete Discrete 2-D Gabor Transforms by Neural Networks for Image Analysis and Compression,” IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 36, no. 7, pp.1169-1179, July 1988.
[23] D. Deng, K.P. Chan, and Y. Yu, “Handwritten Chinese Character Recognition Using Spatial Gabor Filters and Self-Organizing Feature Maps,” Proc. Int'l Conf. Image Processing, vol. 3, pp. 940-944, 1994.
[24] X. Chen, J. Yang, J. Zhang, and A. Waibel, “Automatic Detection and Recognition of Signs from Natural Scenes,” IEEE Trans. Image Processing, vol. 13, no. 1, pp. 87-99, Jan. 2004.
[25] B.S. Manjunath and W.Y. Ma, “Texture Features for Browsing and Retrieval of Data,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, no. 8, pp. 837-842, Aug. 1996.
[26] K. Kukich, “Technique for Automatically Correcting Words in Text,” ACM Computing Surveys, vol. 24, no. 4, pp. 377-439, 1992.
[27] J. Goodman, “Exponential Priors for Maximum Entropy Models,” technical report, Microsoft Research, 2003.
[28] P.M. Williams, “Bayesian Regularization and Pruning Using a Laplace Prior,” Neural Computation, vol. 7, pp. 117-143, 1995.
[29] P.J. Fleming and J.J. Wallace, “How Not to Lie with Statistics: The Correct Way to Summarize Benchmark Results,” Comm. ACM, vol. 29, no. 3, pp. 218-221, 1986.

Index Terms:
Scene text recognition, optical character recognition, conditional random fields, factor graphs, graphical models, lexicon, language model, similarity, belief propagation, sparse belief propagation.
Citation:
Jerod J. Weinman, Erik Learned-Miller, Allen R. Hanson, "Scene Text Recognition Using Similarity and a Lexicon with Sparse Belief Propagation," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 10, pp. 1733-1746, Oct. 2009, doi:10.1109/TPAMI.2009.38
Usage of this product signifies your acceptance of the Terms of Use.