This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Automatic Pattern Recognition: A Study of the Probability of Error
July 1988 (vol. 10 no. 4)
pp. 530-543

A test sequence is used to select the best rule from a class of discrimination rules defined in terms of the training sequence. The Vapnik-Chervonenkis and related inequalities are used to obtain distribution-free bounds on the difference between the probability of error of the selected rule and the probability of error of the best rule in the given class. The bounds are used to prove the consistency and asymptotic optimality for several popular classes, including linear discriminators, nearest-neighbor rules, kernel-based rules, histogram rules, binary tree classifiers, and Fourier series classifiers. In particular, the method can be used to choose the smoothing parameter in kernel-based rules, to choose k in the k-nearest neighbor rule, and to choose between parametric and nonparametric rules.

[1] M. A. Aizerman, E. M. Braverman, and L. I. Rozonoer, "Theoretical foundations of the potential function method in pattern recognition learning,"Automat. Remote Contr., vol. 25, pp. 917-936, 1964.
[2] M. A. Aizerman, E. M. Braverman, and L. I. Rozonoer, "The probability problem of pattern recognition learning and the method of potential functions,"Automat. Remote Contr., vol. 25, pp. 1307-1323, 1964.
[3] M. A. Aizerman, E. M. Braverman, and L. I. Rozonoer, "The method of potential functions for the problem of restoring the characteristic of a function converter from randomly observed points,"Automat. Remote Contr., vol. 25, pp. 1546-1556, 1964.
[4] M. A. Aizerman, E. M. Braverman, and L. I. Rozonoer, "Extrapolative problems in automatic control and the method of potential functions,"Amer. Math. Soc. Transl., vol. 87, pp. 281- 303, 1970.
[5] K. S. Alexander, "Probability inequalities for empirical processes and a law of the iterated logarithm,"Amer. Prob., vol. 12, pp. 1041- 1067, 1984.
[6] A. C. Anderson and K. S. Fu, "Design and development of a linear binary tree classifier for leukocytes," Purdue Univ., Lafayette, IN, Tech. Rep. TR-EE-79-31, 1979.
[7] M. W. Anderson and R. D. Benning, "A distribution-free discrimination procedure based on clustering,"IEEE Trans. Inform. Theory, vol. IT-16, pp. 541-548, 1970.
[8] T. W. Anderson, "Some nonparametric multivariate procedures based on statistically equivalent blocks," inMultivariate Analysis, P. R. Krishnaiah, Ed. New York: Academic, 1966, pp. 5-27.
[9] P. Argentiero, R. Chin, and P. Beaudet, "An automated approach to the design of decision tree classifiers,"IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI-4, pp. 51-57, 1982.
[10] L. A. Bartolucci, P. H. Swain, and C. Wu, "Selective radiant temperature mapping using a layered classifier,"IEEE Trans. Geosci. Electron., vol. GE-14, pp. 101-106, 1976.
[11] O. Bashkirov, E. M. Braverman, and I. E. Muchnik, "Potential function algorithms for pattern recognition learning machines,"Automat. Remote Contr., vol. 25, pp. 692-695, 1964.
[12] G. W. Beakley and F. B. Tuteur, "Distribution-free pattern verification using statistically equivalent blocks,"IEEE Trans. Comput., vol. C-21, pp. 1337-1347, 1972.
[13] E. M. Braverman, "The method of potential functions,"Automat. Remote Contr., vol. 26, pp. 2130-2138, 1965.
[14] E. M. Braverman and E. S. Pyatniskii, "Estimation of the rate of convergence of algorithms based on the potential function method,"Automat. Remote Contr., vol. 27, pp. 80-100, 1966.
[15] L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone,Classification and Regression Trees. Belmont, CA: Wadsworth Int., 1984.
[16] T. Cacoullos, "Estimation of a multivariate density,"Ann. Inst. Statist. Math., vol. 18, pp. 179-190, 1965.
[17] R. G. Casey and G. Nagy, "Decision tree design using a probabilistic model,"IEEE Trans. Inform. Theory, vol. IT-30, pp. 93-99, 1984.
[18] N. N. Cencov, "Evaluation of an unknown distribution density from observations,"Sov. Math. Dokl., vol. 3, pp. 1559-1562, 1962.
[19] J. T. Chu, "Some new error bounds and approximations for pattern recognition,"IEEE Trans. Comput., vol. C-23, pp. 194-198, 1974.
[20] T. M. Cover and P. E. Hart, "Nearest neighbor pattern classification,"Ann. Math. Statist., vol. 36, pp. 1049-1051, 1965.
[21] T. M. Cover, "Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition,"IEEE Trans. Electron. Comput., vol. EC-14, pp. 326-334, 1965.
[22] T. M. Cover, "Learning in pattern recognition," inMethodologies of Pattern Recognition, S. Watanabe, Ed. New York: Academic, 1969, pp. 111-132.
[23] T. M. Cover and T. J. Wagner, "Topics in statistical pattern recognition,"Commun. Cybern., vol. 10, pp. 15-46, 1975.
[24] S. DasGupta, "Nonparametric classification rules,"Sankhya Ser. A, vol. 26, pp. 25-30, 1964.
[25] P. A. Devijver and J. Kittler, "On the edited nearest neighbor rule," inProc. 5th Int. Conf. Pattern Recognition, 1980, pp. 72-80.
[26] L. Devroye and T. J. Wagner, "Nonparametric discrimination and density estimation," Electron. Res. Cen., Univ. Texas, Austin, Tech. Rep. 183, 1976.
[27] L. Devroye, "A universalk-nearest neighbor procedure in discrimination," inProc. 1978 IEEE Comput. Soc. Conf. Pattern Recognition Image Processing, 1978, pp. 142-147.
[28] L. Devroye and T. J. Wagner, "Distribution-free performance bounds for potential function rules,"IEEE Trans. Inform. Theory, vol. IT-25, pp. 601-604, 1979.
[29] L. Devroye and T. J. Wagner, "Distribution-free performance bounds with the resubstitution error estimate,"IEEE Trans. Inform. Theory, vol. IT-25, pp. 208- 210, 1979.
[30] L. Devroye and T. J. Wagner, "Distribution-free inequalities for the deleted and holdout error estimates,"IEEE Trans. Inform. Theory, vol. IT-25, pp. 202- 207, 1979.
[31] L. Devroye and T. J. Wagner, "On theL1 convergence of kernel estimators of regression functions with applications in discrimination,"Zeitschrift fur Wahrscheinlichleitstheorie und verwandte Gebiete, vol. 51, pp. 15- 25, 1980.
[32] L. Devroye and T. J. Wagner, "Distribution-free consistency results in nonparametric discrimination and regression function estimation,"Ann. Statist., vol. 8, pp. 231-239, 1980.
[33] L. Devroye, "Bounds for the uniform deviation of empirical measures,"J. Multivariate Anal., vol. 12, pp. 72-79, 1982.
[34] L. Devroye and L. Gyorfi, "Distribution-free exponential bound on theL1error of partitioning estimates of a regression function," inProc. 4th Pannonian Symp. Math. Statist., F. Konecny, J. Mogyorodi, and W. Wertz, Eds. Budapest, Hungary: Akademiai Kiado, 1983, pp. 67-76.
[35] L. Devroye and L. Gyorfi,Nonparametric Density Estimation: The Ll View. New York: Wiley, 1985.
[36] R. O. Duda and P. E. Hart,Pattern Classification and Scene Analysis. New York: 1973.
[37] R. M. Dudley, "Central limit theorems for empirical measures,"Ann. Prob., vol. 6, pp. 899-929, 1978.
[38] R. M. Dudley, "Empirical processes," inEcole de Probabilite de St. -Flour 1982, Lecture Notes Math. 1097. New York: Springer-Verlag, 1984.
[39] B. Efron, "Bootstrap methods: Another look at the jackknife,"Ann. Statist., vol. 7, pp. 1-26, 1979.
[40] B. Efron, "Estimating the error rate of a prediction rule: Improvement on cross validation,"J. Amer. Statist. Assoc., vol. 78, pp. 316- 331, 1983.
[41] L. Feinholz, "Estimation of the performance of partitioning algorithms in pattern classification," M.Sc. thesis, Dep. Math., McGill Univ., Montreal, Canada, 1979.
[42] E. Fix and J. L. Hodges, "Discriminatory analysis, nonparametric discrimination, consistency properties," USAF School of Aviation Med., Randolph Field, TX, Rep. 21-49-004, 1951.
[43] E. Fix and J. L. Hodges, "Discriminatory analysis: Small sample performance," USAF School of Aviation Med., Randolph Field, TX, Rep. 21-49-004, 1952.
[44] J. H. Friedman, "A recursive partitioning decision rule for nonparametric classification,"IEEE Trans. Comput., vol. C-26, pp. 404- 408, 1977.
[45] K. Fukunaga and L. D. Hostetler, "Optimization ofK-nearest-neighbor density estimates,"IEEE Trans. Inform. Theory, vol. IT- 19, pp. 320-326, 1973.
[46] K. Fukunaga and T. E. Flick, "An optimal global nearest neighbor metric,"IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI-6, pp. 314-318, 1984.
[47] P. Gaenssler and W. Stute, "Empirical processes: A survey of results for independent and identically distributed random variables,"Ann. Prob., vol. 7, pp. 193-243, 1979.
[48] P. Gaenssler, "Empirical processes," inLecture Notes-Monograph Ser.Hayward, CA: Inst. Math. Statist., 1983.
[49] G. W. Gates, "The reduced nearest neighbor rule,"IEEE Trans. Inform. Theory, vol. IT-18, pp. 431-433, 1972.
[50] M. P. Gessaman, "A consistent nonparametric multivariate density estimator based on statistically equivalent blocks,"Ann. Math. Statist., vol. 41, pp. 1344-1346, 1970.
[51] M. P. Gessaman and P. H. Gessaman, "A comparison of some multivariate discrimination procedures,"J. Amer. Statist. Ass., vol. 67, pp. 468-472, 1972.
[52] E. Gine and J. Zinn, "Some limit theorems for empirical processes,"Ann. Prob., vol. 12, pp. 929-989, 1984.
[53] N. Glick, "Sample-based classification procedures derived from density estimators,"J. Amer. Statist. Ass., vol. 67, pp. 116-122, 1972.
[54] N. Glick, "Sample-based multinomial classification,"Biometrics, vol. 29, pp. 241-256, 1973.
[55] N. Glick, "Sample-based classification procedures related to empiric distributions,"IEEE Trans. Inform. Theory, vol. IT-22, pp. 454- 461, 1976.
[56] N. Glick, "Additive estimators for probabilities of correct classification,"Pattern Recognition, vol. 10, pp. 211-222, 1978.
[57] L. Gordon and R. A. Olshen, "Asymptotically efficient solutions to the classification problem,"Ann. Statist., vol. 6, pp. 515-533, 1978.
[58] L. Gordon and R. A. Olshen, "Consistent nonparametric regression from recursive partitioning schemes,"J. Multivariate Anal., vol. 10, pp. 611-627, 1980.
[59] R. D. Gordon, "Values of Mills' ratio of area to bounding ordinate and of the normal probability integral for large values of the argument,"Ann. Math. Statist., vol. 12, pp. 364-366, 1941.
[60] W. Greblicki, "Asymptotically optimal probabilistic algorithms for pattern recognition and identification," monograph 3, Prace Naukowe Instytutu Cybernetyki Technicznej Politechniki Wroclawsjiej 18, Wroclaw, Poland, 1974.
[61] W. Greblicki, "Pattern recognition procedures with nonparametric density estimates,"IEEE Trans. Syst. Man, Cybern., vol. SMC-8, pp. 809-812, 1978.
[62] W. Greblicki, "Asymptotically optimal pattern recognition procedures with density estimates,"IEEE Trans. Inform. Theory, vol. IT-24, pp. 250-251, 1978.
[63] W. Greblicki and M. Pawlak, "Classification using the Fourier series estimate of multivariate density functions,"IEEE Trans. Syst., Man, Cybern., vol. 11, pp. 726-730, 1981.
[64] W. Greblicki, "Asymptotic efficiency of classifying procedures using the Hermite series estimate of multivariate probability densities,"IEEE Trans. Inform. Theory, vol. IT-27, pp. 364-366, 1981.
[65] W. Greblicki and M. Pawlak, "A classification procedure using the multiple Fourier series,"Inform. Sci., vol. 26, pp. 115-126, 1982.
[66] W. Greblicki and M. Pawlak, "Almost sure convergence of classification procedures using Hermite series density estimates"Pattern Recognition Lett., vol. 2, pp. 13-17, 1983.
[67] W. Greblicki, A. Krzyzak, and M. Pawlak, "Distribution-free pointwise consistency of kernel regression estimate,"Ann. Statist., vol. 12, pp. 1570-1575, 1984.
[68] W. Greblicki and M. Pawlak, "Pointwise consistency of the Hermite series density estimate,"Statist. Prob. Lett., vol. 3, pp. 65- 69, 1985.
[69] D. E. Gustafson, S. Gelfand, and S. K. Mitter, "A nonparametric multiclass partitioning method for classification," inProc. 5th Int. Conf. Pattern Recognition, 1980, pp. 654-659.
[70] W. Hardle and J. S. Marron, "Optimal bandwidth selection in non-parametric regression function estimation,"Ann. Statist., vol. 13, pp. 1465-1481, 1985.
[71] P. E. Hart, "The condensed nearest neighbor rule,"IEEE Trans. Inform. Theory, vol. IT-14, pp. 515-516, 1968.
[72] E. G. Henrichon and K. S. Fu, "A nonparametric partitioning procedure for pattern classification,"IEEE Trans. Comput., vol. C-18, pp. 614-624, 1969.
[73] W. Hoeffding, "Probability inequalities for sums of bounded random variables,"J. Amer. Statist. Ass., vol. 58, pp. 13-30, 1963.
[74] L. Kanal, "Patterns in pattern recognition 1968-1974,"IEEE Trans. Inform. Theory, vol. IT-20, pp. 697-722, 1974.
[75] R. A. Kronmal and M. E. Tarter, "The estimation of probability densities and cumulatives by Fourier series methods,"J. Amer. Statist. Ass., vol. 63, pp. 925-952, 1968.
[76] A. Krzyzak, "The rates of convergence of kernel regression estimates and classification rules,"IEEE Trans. Inform. Theory, vol. IT-32, pp. 668-679, 1986.
[77] A. V. Kulkarni, "On the mean accuracy of hierarchical classifiers,"IEEE Trans. Comput., vol. C-27, pp. 771-776, 1978.
[78] A. V. Kulkarni and L. N. Kanal, "Admissible search strategies for parametric and nonparametric hierarchical classifiers," inProc. 4th Int. Joint Conf. Pattern Recognition, 1978, pp. 238-248.
[79] M. W. Kurzynski, "The optimal strategy of a tree classifier,"Pattern Recognition, vol. 16, pp. 81-87, 1983.
[80] P. A. Lachenbruch, "An almost unbiased method of obtaining confidence intervals for the probability of misclassification in discriminant analysis,"Biometrics, vol. 23, pp. 639-645, 1967.
[81] P. A. Lachenbruch and M. R. Mickey, "Estimation of error rates in discriminant analysis,"Technometrics, vol. 10, pp. 1-11, 1968.
[82] P. A. Lachenbruch, C. Sneeringer, and L. T. Revo, "Robustness of the linear and quadratic discriminant functions to certain types of non-normality,"Commun. Statist., vol. 1, pp. 39-56, 1972.
[83] Y. K. Lin and K. S. Fu, "Automatic classification of cervical cells using a binary tree classifier,"Pattern Recognition, vol. 16, pp. 69-80, 1983.
[84] T. Lissack and K. S. Fu, "Error estimation in pattern recognition via L-distance between posterior density functions,"IEEE Trans. Inform. Theory, vol. IT-22, pp. 34-45, 1976.
[85] A. L. Lunts and V. L. Brailovsky, "Evaluation of attributes obtained in statistical decision rules,"Eng. Cybern., vol. 0, pp. 98- 109, 1967.
[86] P. C. Mahalanobis, "A method of fractile graphical analysis,"Sankhya Ser. A, vol. 23, pp. 41-64, 1961.
[87] P. Massart, "Vitesse de convergence dans le theoreme de la limite centrale pour le processus empirique," Ph.D. dissertation, Univ. Paris-Sud, Orsay, France, 1983.
[88] G. J. McLachlan, "The bias of the apparent error rate in discriminant analysis,"Biometrika, vol. 63, pp. 239-244, 1976.
[89] W. Meisel, "Potential functions in mathematical pattern recognition,"IEEE Trans. Comput., vol. C-18, pp. 911-918, 1969.
[90] W. S. Meisel and D. A. Michalopoulos, "A partitioning algorithm with application in pattern classification and the optimization of decision tree,"IEEE Trans. Comput., vol. C-22, pp. 93-103, 1973.
[91] D. S. Mitrinovic,Analytic Inequalities. New York: Springer-Verlag, 1970.
[92] J. K. Mui and K. S. Fu, "Automated classification of nucleated blood cells using a binary tree classifier,"IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI-2, pp. 429-443, 1980.
[93] E. A. Nadaraya, "On estimating regression,"Theory Prob. Appl., vol. 9, pp. 141-142, 1964.
[94] E. A. Nadaraya, "On nonparametric estimates of density functions and regression curves,"Theory Prob. Appl., vol. 10, pp. 186-190, 1965.
[95] E. A. Nadaraya, "Remarks on nonparametric estimates for density functions and regression curves,"Theory Prob. Appl., vol. 15, pp. 134- 137, 1970.
[96] R. A. Olshen, "Comments on a paper by C. J. Stone,"Ann. Statist., vol. 5, pp. 632-633, 1977.
[97] K. R. Parthasarathy and P. K. Bhattacharya, "Some limit theorems in regression theory,"Sankhya Ser. A, vol. 23, pp. 91-102, 1961.
[98] E. Parzen, "On the estimation of a probability density function and the mode,"Ann. Math. Statist., vol. 33, pp. 1065-1076, 1962.
[99] E. A. Patrick, "Distribution-free minimum conditional risk learning systems," Purdue Univ., Lafayette, IN, Tech. Rep. TR-EE- 66-18, 1966.
[100] E. A. Patrick and F. P. Fisher, II, "Introduction to the performance of distribution-free conditional risk learning systems," Purdue Univ., Lafayette, IN, Tech. Rep. TR-EE-67-12, 1967.
[101] H. J. Payne and W. S. Meisel, "An algorithm for constructing optimal binary decision trees,"IEEE Trans. Comput., vol. C-26, pp. 905-916, 1977.
[102] J. Pearl, "Capacity and error estimates for boolean classifiers with limited complexity,"IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI-1, pp. 350-355, 1979.
[103] N. Pippenger, "Information theory and the complexity of Boolean functions,"Math. Syst. Theory, vol. 10, pp. 124-162, 1977.
[104] S. Qing-Yun and K. S. Fu, "A method for the design of binary tree classifiers,"Pattern Recognition, vol. 16, pp. 593-603, 1983.
[105] C. P. Quesenberry and M. P. Gessaman, "Nonparametric discrimination using tolerance regions,"Ann. Math. Statist., vol. 39, pp. 664-673, 1968.
[106] L. Rejto and P. Revesz, "On empirical density function,"Probl. Contr. Inform. Theory, vol. 2, pp. 67-80, 1973.
[107] G. L. Ritter, H. B. Woodruff, S. R. Lowry, and T. L. Isenhour, "An algorithm for a selective nearest neighbor decision rule,"IEEE Trans. Inform. Theory, vol. IT-21, pp. 665-669, 1975.
[108] M. Rosenblatt, "Remarks on some nonparametric estimates of a density function,"Ann. Math. Statist., vol. 27, pp. 832-837, 1956.
[109] E. M. Rounds, "A combined nonparametric approach to feature selection and binary decision tree design,"Pattern Recognition, vol. 12, pp. 313-317, 1980.
[110] R. M. Royall, "A class of nonparametric estimators of a smooth regression function," Ph.D. dissertation, Stanford Univ., Stanford, CA, 1966.
[111] S. C. Schwartz, "Estimation of probability density by an orthogonal series,"Ann. Math. Statist., vol. 38, pp. 1261-1265, 1967.
[112] G. Sebestyen,Decision Making Processes in Pattern Recognition. New York: Macmillan, 1962.
[113] I. K. Sethi and B. Chatterjee, "Efficient decision tree design for discrete variable pattern recognition problems,"Pattern Recognition, vol. 9, pp. 197-206, 1977.
[114] I. K. Sethi and G. P. R. Sarvarayudu, "Hierarchical classifier design using mutual information,"IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI-4, pp. 441-445, 1981.
[115] R. D. Short and K. Fukunaga, "The optimal distance measure for nearest neighbor classification,"IEEE Trans. Inform. Theory, vol. IT-27, pp. 622-627, 1981.
[116] D. F. Specht, "Series estimation of a probability density function,"Technometrics, vol. 13, pp. 409-424, 1971.
[117] C. Spiegelman and J. Sacks, "Consistent window estimation in nonparametric regression,"Ann. Statist., vol. 8, pp. 240-246, 1980.
[118] D. S. Stoller, "Univariate two-population distribution-free discrimination,"J. Amer. Statist. Ass., vol. 49, pp. 770-777, 1954.
[119] C. J. Stone, "Consistent nonparametric regression,"Ann. Statist., vol. 8, pp. 1348-1360, 1977.
[120] C. J. Stone, "Optimal global rates of convergence for nonparametric regression,"Ann. Statist., vol. 10, pp. 1040-1053, 1982.
[121] M. Stone, "Cross-validatory choice and assessment of statistical predictions,"J. Roy. Statist. Soc., vol. 36, pp. 111-147, 1974.
[122] P. H. Swain and H. Hauska, "The decision tree classifier: design and potential,"IEEE Trans. Geosc. Electron, vol. GE-15, pp. 142-147, 1977.
[123] M. E. Tarter and R. A. Kronmal, "On multivariate density estimates based on orthogonal expansions,"Ann. Math. Statist., vol. 41, pp. 718-722, 1970.
[124] J. Taylor, P. H. Bartels, M. Bibbo, and G. L. Wied, "Automated hierarchic decision structures for multiple category cell classification by TICAS,"Acta Cytologica, vol. 22, p. 4, 1978.
[125] J. Tomek, "Two modifications of CNN,"IEEE Trans. Syst. Man, Cybern., vol. SMC-6, pp. 769-772, 1976.
[126] G. T. Toussaint, "Bibliography on estimation of misclassification,"IEEE Trans. Inform. Theory, vol. 20, pp. 472-479, 1974.
[127] J. W. Tukey, "Curves as parameters and touch estimation," inProc. 4th Berkeley Symp., 1961, pp. 681-694.
[128] J. R. Ullmann, "Automatic selection of reference data for use in a nearest-neighbor method of pattern classification,"IEEE Trans. Inform. Theory, vol. IT-20, pp. 541-543, 1974.
[129] J. VanRyzin, "Bayes risk consistency of classification procedures using density estimation,"Sankhya Ser. A, vol. 28, pp. 161-170, 1966.
[130] V. N. Vapnik and A. Ya. Chervonenkis, "Theory of uniform convergence of frequencies of events to their probabilities and problems of search for an optimal solution from empirical data,"Automat. Remote Contr., vol. 32, pp. 207-217, 1971.
[131] V. N. Vapnik and A. Ya. Chervonenkis, "On the uniform convergence of relative frequencies of events to their probabilities,"Theory Prob. Appl., vol. 16, pp. 264-280, 1971.
[132] V. N. Vapnik and A. Ya. Chervonenkis, "Ordered risk minimization. I,"Automat. Remote Contr., vol. 35, pp. 1226-1235, 1974.
[133] V. N. Vapnik and A. Ya. Chervonenkis, "Ordered risk minimization. II"Automat. Remote Contr., vol. 35, pp. 1403-1412, 1974.
[134] V. N. Vapnik and A. Ya. Chervonenkis,Theory of Pattern Recognition. Moscow: Nauka, 1974.
[135] V. N. Vapnik and A. Ya. Chervonenkis, "Necessary and sufficient conditions for the uniform convergence of means to their expectations,"Theory Prob. Appl., vol. 26, pp. 532-553, 1981.
[136] V. N. Vapnik,Estimation of Dependences Based on Empirical Data. New York: Springer-Verlag, 1982.
[137] T. J. Wagner, "Convergence of the nearest neighbor rule,"IEEE Trans. Inform. Theory, vol. IT-17, pp. 566-571, 1971.
[138] T. J. Wagner, "Convergence of the edited nearest neighbor,"IEEE Trans. Inform. Theory, vol. IT-19, pp. 696-699, 1973.
[139] D. L. Wilson, "Asymptotic properties of nearest neighbor rules using edited data,"IEEE Trans. Syst., Man, Cybern., vol. SMC- 2, pp. 408-421, 1972.
[140] K. C. You and K. S. Fu, "An approach to the design of a linear binary tree classifier," inProc. Symp. Machine Processing of Remotely Sensed Data, Purdue Univ., Lafayettle, IN, 1976, pp. 3A- 10.
[141] J. E. Yukich, "Laws of large numbers for classes of functions,"J. Multivariate Anal., vol. 17, pp. 245-260, 1985.

Index Terms:
automatic pattern recognition; error statistics; artificial intelligence; probability; training sequence; linear discriminators; nearest-neighbor rules; kernel-based rules; histogram rules; binary tree classifiers; Fourier series classifiers; artificial intelligence; computerised pattern recognition; error statistics; probability
Citation:
L. Devroye, "Automatic Pattern Recognition: A Study of the Probability of Error," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 10, no. 4, pp. 530-543, July 1988, doi:10.1109/34.3915
Usage of this product signifies your acceptance of the Terms of Use.