This Article 
 Bibliographic References 
 Add to: 
On an Asymptotically Optimal Adaptive Classifier Design Criterion
March 1993 (vol. 15 no. 3)
pp. 312-318

A new approach for estimating classification errors is presented. In the model, there are two types of classification error: empirical and generalization error. The first is the error observed over the training samples, and the second is the discrepancy between the error probability and empirical error. In this research, the Vapnik and Chervonenkis dimension (VCdim) is used as a measure for classifier complexity. Based on this complexity measure, an estimate for generalization error is developed. An optimal classifier design criterion (the generalized minimum empirical error criterion (GMEE)) is used. The GMEE criterion consists of two terms: the empirical and the estimate of generalization error. As an application, the criterion is used to design the optimal neural network classifier. A corollary to the Gamma optimality of neural-network-based classifiers is proven. Thus, the approach provides a theoretic foundation for the connectionist approach to optimal classifier design. Experimental results to validate this approach.

[1] J. Y. Hsiao and A. A. Sawchuk, "Supervised texture image segmentation using texture smoothing and probablistic relaxation techniques,"IEEE Trans. Patt. Anal. Machine Intell., vol. 11, no. 12, pp. 1279-1293, Dec. 1989.
[2] R. M. Haralick and L. G. Shapiro, "Survey: Image segmentation techniques,"Comput. Vision Graphics Image Processing, vol. 29, pp. 100-132, 1985.
[3] L. R. Rabiner and B. H. Juang, "An introduction to hidden Markov models,"IEEE ASSP Mag., pp. 4-16, Jun. 1986.
[4] K. Fukunaga,Introduction to Statistical Pattern Recognition. New York: Academic, 1972.
[5] R. O. Duda and P. E. Hart,Pattern Classification and Scene Analysis. New York: Wiley, 1973.
[6] R. Lippmann, "Pattern classification using neural networks,"IEEE Commun. Mag., vol. 27, no. 11, 1989.
[7] N. Glick, "Sample-based classification procedures related to empiric distributions,"IEEE Trans. Inform. Theory, vol. IT-22, pp. 454- 461, 1976.
[8] V. N. Vapnik and A. Ya. Chervonenkis, "On the uniform convergence of relative frequencies of events to their probabilities,"Theory Prob. Appl., vol. 17, no. 2, pp. 264-280, 1971.
[9] V. N. Vapnik,Estimation of Dependency Based on Empirical Data. New York: Springer-Verlag, 1982.
[10] U. Grenander,Abstract Inference. New York: Wiley, 1981.
[11] L. Devroye, "Automatic pattern recognition: A study of the probability of error,"IEEE Trans. Patt. Anal. Machine Intell., vol. 10, no. 4, pp. 530-543, July 1988.
[12] R. M. Dudley, "Central limit theorems for empirical measures,"Ann. Prob.pp. 899-929, 1978.
[13] A. Blumer, A. Ehrenfeucht, D. Haussler, and M. Warmuth, "Learning and the Vapnik-Chervonenkis dimension," U. C. Santz Crutz Tech. Rep. UCSC-CRL-87-20, 1987.
[14] E. B. Baum and D. Haussler, "What size net gives valid generalization?" inAdvances in Neural Info. Processing Syst.(D. S. Touretzky, Ed.). San Mateo, CA: Morgan Kaufmann, 1988.
[15] W. -T. Lee and M. F. Tenorio, "The computation of Vapnik-Chervonenkis dimension of neural network with sigmoidal nodes," to be published in 1991.
[16] A. R. Barron, "Statistical properties of artificial neural networks," presented at 28th IEEE Conf. Dec. Contr., Tampa, FL, Dec. 1989.
[17] G. T. Toussaint, "Bibliography on estimation of misclassification,"IEEE Trans. Inform. Theory, vol. 20, pp. 472-479, 1974.
[18] D. M. Foley, "Considerations of sample and feature size,"IEEE Trans. Inform. Theory, vol. IT-18, pp. 618-626, 1972.
[19] W. T. Lee and M. F. Tenorio, "On optimal adaptive classifier design-II," to be published in 1991.
[20] Rumelhart, McClelland, and PDP research Group,Parallel Distribution Processing, Vol. 1. Cambridge, MA: MIT Press, 1987.
[21] G. Cybenko, "Approximations by superpositions of a sigmoidal function," CSRD Rep. No. 856, Univ. of Illinois, Urbana, Feb. 1989.
[22] D. W. Ruck, S. K. Rogers, M. Kabrinsky, M. E. Oxley, and B. W. Sutter, "The multilayer perceptron as an approximation to a Bayes optimal discriminant function,"IEEE Trans. Neural Networks, vol. 1, no. 4, pp. 296-298, Dec. 1990.
[23] N. Morgan and H. Bourland, "Generalization and parameter estimation in feedforward nets: Some experiments," inAdvances in Neural Information Processing Systems(D. S. Touretzky, ed.), 1990, pp. 630-637.
[24] A. S. Weigend, B. A. Huberman, and D. E. Rumelhart, "Predicting the future: A connectionist approach," TR-PARC-SSL-90-20, Stanford Univ., July 1990.
[25] D. Cohn and G. Tesauro, "How tight are the Vapnik-Chervonenkis bounds?," TR-91-03-04, Univ. Washington, 1990.
[26] A. N. Tikhonov and V. Y. Arsenin,Solutions of Ill-Posed Problems. New York: Wiley 1977.
[27] M. Stone, "Cross-validation choice and assessment of statistical predications,"J. Royal Stat. Soc., vol. B-36, pp. 111-147, 1974.
[28] P. Craven and G. Wahba, "Smoothing noisy data with spline functions-estimating the correct degree of smoothing by the method of generalized cross-validation,"Numerische Mathematik, vol. 31, pp. 377-403, 1979.

Index Terms:
classification error estimation; image recognition; Vapnik-Chervonenkis dimension; asymptotically optimal adaptive classifier; design criterion; generalization error; error probability; classifier complexity; generalized minimum empirical error criterion; neural network classifier; error analysis; estimation theory; image recognition; neural nets; optimisation
W.T. Lee, M.F. Tenorio, "On an Asymptotically Optimal Adaptive Classifier Design Criterion," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 15, no. 3, pp. 312-318, March 1993, doi:10.1109/34.204915
Usage of this product signifies your acceptance of the Terms of Use.