This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
On the Recognition of Printed Characters of Any Font and Size
February 1987 (vol. 9 no. 2)
pp. 274-288
Simon Kahan, Department of Computer Science, University of Washington, Seattle, WA 98195; AT&T Bell Laboratories, Murray Hill, NJ 07974.
Theo Pavlidis, AT&T Bell Laboratories, Murray Hill, NJ 07974; Department of Electrical Engineering, State University of New York at Stony Brook, Stony Brook, NY 11794.
Henry S. Baird, AT&T Bell Laboratories, Murray Hill, NJ 07974.
We describe the current state of a system that recognizes printed text of various fonts and sizes for the Roman alphabet. The system combines several techniques in order to improve the overall recognition rate. Thinning and shape extraction are performed directly on a graph of the run-length encoding of a binary image. The resulting strokes and other shapes are mapped, using a shape-clustering approach, into binary features which are then fed into a statistical Bayesian classifier. Large-scale trials have shown better than 97 percent top choice correct performance on mixtures of six dissimilar fonts, and over 99 percent on most single fonts, over a range of point sizes. Certain remaining confusion classes are disambiguated through contour analysis, and characters suspected of being merged are broken and reclassified. Finally, layout and linguistic context are applied. The results are illustrated by sample pages.
Citation:
Simon Kahan, Theo Pavlidis, Henry S. Baird, "On the Recognition of Printed Characters of Any Font and Size," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 9, no. 2, pp. 274-288, Feb. 1987, doi:10.1109/TPAMI.1987.4767901
Usage of this product signifies your acceptance of the Terms of Use.