This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Large-Scale Simulation Studies in Image Pattern Recognition
October 1997 (vol. 19 no. 10)
pp. 1067-1079

Abstract—Many obstacles to progress in image pattern recognition result from the fact that per-class distributions are often too irregular to be well-approximated by simple analytical functions. Simulation studies offer one way to circumvent these obstacles. We present three closely related studies of machine-printed character recognition that rely on synthetic data generated pseudorandomly in accordance with an explicit stochastic model of document image degradations. The unusually large scale of experiments—involving several million samples—that this methodology makes possible has allowed us to compute sharp estimates of the intrinsic difficulty (Bayes risk) of concrete image recognition problems, as well as the asymptotic accuracy and domain of competency of classifiers.

[1] H.S. Baird and R. Fossey, "A 100-Font Classifier," Proc. First Int'l Conf. Document Analysis and Recognition, pp. 332-340,St.-Malo, France, Sept. 20- Oct.2, 1991.
[2] H.S. Baird, "Document Image Defect Models," Structured Document Image Analysis, H.S. Baird, H. Bunke, and K. Yamamoto, eds., pp. 546-556. Springer-Verlag, 1992.
[3] H.S. Baird, "Calibration of Document Image Defect Models," Proc. Second Ann. Symp. Document Analysis and Information Retrieval, pp. 1-16, Apr.26-28, 1993.
[4] H.S. Baird, "Document Image Defect Models and Their Uses," Proc. Second Int'l Conf. Document Analysis and Recognition, pp. 62-67,Tsukuba Science City, Japan, Oct.20-22, 1993.
[5] T.M. Cover and P. Hart, "Nearest Neighbor Pattern Classification," Proc. IEEE Trans. Information Theory, pp. 21-27, 1967.
[6] R.O. Duda and P.E. Hart, Pattern Classification and Scene Analysis.Reading, Mass: Addison-Wesley, 1973.
[7] K. Fukunaga and D.M. Hummels, "Bias of Nearest Neighbor Error Estimates," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 9, no. 1, pp. 103-112, Jan. 1987.
[8] D.J. Hand, "Recent Advances in Error Rate Estimation," Pattern Recognition Letters, vol. 4, pp. 335-346, 1986.
[9] T.K. Ho, "A Theory of Multiple Classifier Systems and Its Application to Visual Word Recognition," doctoral dissertation, Dept. of Computer Science, SUNY at Buffalo, 1992.
[10] T.K. Ho, “Random Decision Forests,” Proc. Third Int'l Conf. Document Analysis and Recognition, pp. 278-282, 1995.
[11] T.K. Ho, H.S. Baird, "Perfect Metrics," Proc. Second Int'l Conf. Document Analysis and Recognition, pp. 593-597,Tsukuba Science City, Japan, Oct.20-22, 1993.
[12] T.K. Ho and H.S. Baird, "Asymptotic Accuracy of Two-Class Discrimination," Proc. Third Ann. Symp. Document Analysis and Information Retrieval, pp. 275-288,Las Vegas, Apr.11-13, 1994.
[13] T.K. Ho and H.S. Baird, "Estimating the Intrinsic Difficulty of a Recognition Problem," Proc. 12th Int'l Conf. Pattern Recognition, pp. 178-183,Jerusalem, Israel, Oct.9-13, 1994.
[14] T.K. Ho and H.S. Baird, "Evaluation of OCR Accuracy Using Synthetic Data," Proc. Fourth Ann. Symp. Document Analysis and Information Retrieval, pp. 413-422, Apr.24-26, 1995.
[15] T.K. Ho and H.S. Baird, "Pattern Classification with Compact Distribution Maps," Computer Vision and Image Understanding, to appear, 1998.
[16] T. Kanungo, R.M. Haralick, and H.S. Baird, "Validation and Estimation of Document Degradation Models," Proc. Fourth Ann. Symp. Document Analysis and Information Retrieval, pp. 217-225, Apr.24-26, 1995.
[17] S.V. Rice, J. Kanai, and T.A. Nartker, "An Evaluation of OCR Accuracy," Information Science Research Institute, 1993 Ann. Research Report, Univ. of Nevada, Las Vegas, pp. 9-20, 1993.
[18] S.V. Rice, J. Kanai, and T.A. Nartker, "The Third Annual Test of OCR Accuracy," Information Science Research Institute, 1994 Ann. Research Report, Univ. of Nevada, Las Vegas, pp. 11-38, 1994.
[19] G.T. Toussaint, "Bibliography on Estimation of Misclassification," IEEE Trans. Information Theory, vol. 20, no. 4, pp. 472-479, July 1974.
[20] English Document Database I&II CD-ROM Set, The Intelligent Systems Laboratory of the Dept. of Electrical Eng., Univ. of Washington, Seattle, Fall 1993.
[21] V. Vapnik, Estimation of Dependences Based on Empirical Data. Springer-Verlag, 1982.

Citation:
Tin Kam Ho, Henry S. Baird, "Large-Scale Simulation Studies in Image Pattern Recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 10, pp. 1067-1079, Oct. 1997, doi:10.1109/34.625107
Usage of this product signifies your acceptance of the Terms of Use.