This Article 
 Bibliographic References 
 Add to: 
Handwritten Character Classification Using Nearest Neighbor in Large Databases
September 1994 (vol. 16 no. 9)
pp. 915-919

Shows that systems built on a simple statistical technique and a large training database can be automatically optimized to produce classification accuracies of 99% in the domain of handwritten digits. It is also shown that the performance of these systems scale consistently with the size of the training database, where the error rate is cut by more than half for every tenfold increase in the size of the training set from 10 to 100,000 examples. Three distance metrics for the standard nearest neighbor classification system are investigated: a simple Hamming distance metric, a pixel distance metric, and a metric based on the extraction of penstroke features. Systems employing these metrics were trained and tested on a standard, publicly available, database of nearly 225,000 digits provided by the National Institute of Standards and Technology. Additionally, a confidence metric is both introduced by the authors and also discovered and optimized by the system. The new confidence measure proves to be superior to the commonly used nearest neighbor distance.

[1] D. H. Aha, D. Kibler, and M. K. Albert, "Instance-based learning algorithms,"Machine Learning, vol. 6, no. 1, pp. 37-66, 1991.
[2] P. Ahmed and C. Y. Suen, "Computer recognitions of totally unconstrained handwritten zip codes,"Int. J. Pattern Recognition and Artificial Intell., 1, 1987.
[3] C. Atkeson, "Roles of knowledge in motor learning," MIT AI Lab Tech. Rep. 942, 1986.
[4] G. Borgefors, T. Hartmann, and S. L. Tanimoto, "Parallel distance transforms on pyramid machines: Theory and implementation,"Signal Processing, vol. 21, no. 1, pp. 61-86, 1990.
[5] D. J. Burr, "Elastic matching of line drawings,"IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI-3, no. 6, pp. 708-713, 1981.
[6] K. A. Church, "A stochastic parts program and noun phrase parser for unrestricted text," unpublished manuscript, AT&T Bell Labs, Murray Hill, NJ, 1986.
[7] T. Cover and P. Hart, "Nearest neighbor pattern classification,"IEEE Trans. Inform. Theory, vol. IT-13, pp. 21-27, 1967.
[8] R. Creecy, B. Masand, S. Smith, and D. Waltz, "Trading mips and memory for knowledge engineering,"Commun. ACM, vol. 35, no. 8, pp. 48-64, Aug. 1992.
[9] B. Dasrathy, Ed.,Nearest Neighbor Pattern Classification. Los Alamitos, CA: IEEE Computer Society Press, 1990.
[10] P. Danielsson, "Euclidean distance mapping,"Computer Graphics and Image Processing, vol. 14, pp. 227-248, 1980.
[11] G. Hinton, C. Williams, and M. Revow, "Adaptive elastic models for handprinted character recognition,"Advances in Neural Information Processing Systems 4, J. Moody, S. Hanson, and R. Lippmann, Eds. San Mateo, CA: Morgan Kauffmann, 1992.
[12] M. D. Garris and R. A. Wilkinson, "NIST special database 3. Handwritten segmented characters,"NIST, Gaithersburg, MD.
[13] S. Kahan, T. Pavlidis, and H. S. Baird, "On the recognition of printed characters of any font and size,"IEEE Trans. Patt. Anal. Machine Intell., vol. PAMI-9, pp. 274-288, 1987.
[14] J. Kelly and L. Davis, "A hybrid genetic algorithm for classification," inProc. Twelfth Int. Joint Conf. an Artificial Intell., Sydney, Australia, Aug. 1991.
[15] J. Kelly and L. Davis, "Hybridizing the genetic algorithm and the K nearest neighbors classification algorithm," inProc. Fourth Int. Conference on Genetic Algorithms, San Diego, CA, July, 1991.
[16] B. Masand, "Effects of query and database sizes on classification of news stories using memory based reasoning," presented at theAAAI Spring Symposium on Case Based Reasoning, Palo Alto, CA, Apr. 1993.
[17] T. Pavlidis,Algorithms for Graphics and Image Processing. Rockville, MD: Computer Science Press, 1982.
[18] A. Rosenfeld and J. L. Pfaltz, "Distance functions on digital pictures,"Pattern Recognition, vol. 1, pp. 33-61, 1968.
[19] S. Smith, "A handwritten character recognition system for the connection machine CM-2 supercomputer," inSupercomputing Symp. '92, Ottawa, Canada, 1992, pp. 377-389.
[20] C. Stanfil and D. Waltz, "Toward memory-based reasoning,"Commun. ACM, vol. 29, pp. 1213-1228, 1986.
[21] B. Widrow, "The 'Rubber-mask' technique I. Pattern measurement and analysis,"Pattern Recognit., vol. 5, pp. 175-197, 1973.
[22] R. A. Wilkinson, J. Geist, S. Janet, P. Grother, C. Burges, R. Creecy, B. Hammond, J. Hull, N. Larsen, T. Vogl, C. Wilson, "The first census optical character recognition system conference," Nat. Inst. of Standards and Technol. Tech. Rep. #NISTIR 4912, Gaithersburg, MD, Aug. 1992.

Index Terms:
optical character recognition; learning (artificial intelligence); computer vision; handwritten character classification; nearest neighbor distance; simple statistical technique; large training database; classification accuracies; handwritten digits; error rate; distance metrics; standard nearest neighbor classification system; Hamming distance metric; pixel distance metric; penstroke features extraction; National Institute of Standards and Technology; confidence metric
S.J. Smith, M.O. Bourgoin, K. Sims, H.L. Voorhees, "Handwritten Character Classification Using Nearest Neighbor in Large Databases," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 16, no. 9, pp. 915-919, Sept. 1994, doi:10.1109/34.310689
Usage of this product signifies your acceptance of the Terms of Use.