loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
N-Tuple Features for OCR Revisited
July 1996 (vol. 18 no. 7)
pp. 734-745

Abstract—N-tuple features for optical character recognition have received only scattered attention since the 1960s. Our main purpose here is to show that advances in computer technology and computer science compel renewed interest. N-tuple features are useful for printed character classification because they indicate the presence or absence of a given rigid configuration of n black and white pixels in a pattern. Desirable n-tuples fit each pattern of a specified (positive) training set of characters in at least p different shift positions, and fail to fit each pattern of a specified (negative) training set by at least nq pixels in each shift position. In this work we prove that the problem of finding a distinguishing n-tuple is NP-complete, by examining a natural subproblem with binary strings called the missing configuration problem. The NP-completeness result notwithstanding, distinguishing n-tuples are found automatically in a few seconds on contemporary workstations. We exhibit a practical search algorithm for generating, from a small training set, a collection of n-tuples with low class-conditional correlation and with specified design parameters n, p, and q. The generator, which is available on the Internet, is empirically shown to be effective through a comparison with a benchmark generator. We show experimentally that the design parameters provide a useful tradeoff between distinguishing power and generation time, and also between the conditional probabilities for the positive and negative classes. We explore the feature probabilities obtainable for various dichotomies, and show that the design parameters control the feature probabilities.

[1] 734 R. Bakis, N.M Herbst, and G. Nagy, "An Experimental Study of Machine Recognition of Hand-Printed Numerals," IEEE Trans. Systems, Science, and Cybernetics, vol. 4, no. 2, pp. 119-132, July 1968.[2] W.W. Bledsoe and I. Browning, "Pattern Recognition and Reading by Machine," Proc. Eastern Joint Computer Conf., no. 16, pp. 225-233, Dec. 1959.[3] R.O. Duda and P.E. Hart, Pattern Classification and Scene Analysis, pp. 32-33. John Wiley&Sons, 1973.[4] M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness.New York: W.H. Freeman, 1979.[5] D.M. Jung, "Joint Feature and Classifier Design for OCR Based on a Small Training Set," PhD thesis, Rensselaer Polytechnic Inst., May 1995.[6] L.A. Kamentsky and C.N. Liu, "Computer-Automated Design of Multifont Print Recognition Logic," IBM J. Research and Development, vol. 7, no. 1, pp. 2-13, 1963.[7] L.A. Kamentsky and C.N. Liu, "A Theoretical and Experimental Study of a Model for Pattern Recognition," Computer and Information Sciences, pp. 194-218.Washington, D.C.: Spartan, 1964.[8] M.D. Levine, "Feature Extraction: A Survey," Proc. IEEE, vol. 57, no. 8, pp. 1,391-1,407, Aug. 1969.[9] C.N. Liu and G.L. Shelton, "An Experimental Investigation of a Mixed Font Print Recognition System," IEEE Trans. Electronic Computers, vol. 15, no. 6, pp. 916-925, Dec. 1966.[10] M. Nadler, "The State of the Art in Optical Character Recognition," Machine Perception of Patterns and Pictures, pp. 3-18.Teddington, Middlesex: Inst. of Physics 1972.[11] G. Nagy and X. Wang, "Automatically-Generated High-Reliability Features for Dichotomies of Printed Characters." Proc. Symp. Document Analysis and Information Retrieval,Las Vegas, Apr. 1996.[12] V. Nageshwara Rao and V. Kumar, "On the Efficiency of Parallel Backtracking," IEEE Trans. Parallel and Distributed Systems, vol. 4, no. 4, pp. 427-437, Apr. 1993; also available as Technical Report TR 90-55, Dept. of Computer Science, Univ. of Minnesota, Minneapolis.[13] N.J. Nilsson, Principles of Artificial Intelligence. Morgan Kaufmann, 1980.[14] S.V. Rice, J. Kanai, and T. Nartker, "The Third Annual Test of OCR Accuracy," Ann. Report, UNLV Information Science Research Inst., p. 15, 1994.[15] A. Shapira, "Experiments on the Generation of Distinguishing N-tuples for Selected Character Dichotomies," Rensselaer Polytechnic Inst., ECSE Dept., Technical Report no. ECSE-OCR-20DEC95, Dec. 1995.[16] A. Shapira Home Page, URL http://www.cs.rpi.edu/~shapiraa/, Internet World Wide Web, 1996.[17] A. Shapira PhD thesis. Rensselaer Polytechnic Inst., 1996 (in preparation).[18] A. Shapira and M. Krishnamoorthy, "Enumeration of the Patterns in a Lattice," Rensselaer Polytechnic Inst. Computer Science Technical Report no. 95-9, May 1995, submitted for publication.[19] F.W.M. Stentiford, "Automatic Feature Design for Optical Character Recognition Using an Evolutionary Search Procedure," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 7, no. 3, pp. 349-355, May 1985.[20] J. Ullman, "Experiments with the n-Tuple Method of Pattern Recognition," IEEE Trans. Computers, vol. 18, no. 12, pp. 1,135-1,137, Dec. 1969.[21] X. Wang, "Improving N-Tuples and Dichotomizers in N-Tuples Based Decision Tree Classifier," master's thesis, Rensselaer Polytechnic Inst., Dec. 1995.

Index Terms:
Backtracking, character features, classification, decision trees, distinguishing string, missing configuration, n-tuples, OCR, simulated parallelism.
Citation:
Dz-Mou Jung, M.s. Krishnamoorthy, George Nagy, Andrew Shapira, "N-Tuple Features for OCR Revisited," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 18, no. 7, pp. 734-745, July 1996, doi:10.1109/34.506795
Usage of this product signifies your acceptance of the Terms of Use.