
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
L. Devroye, "Automatic Pattern Recognition: A Study of the Probability of Error," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 10, no. 4, pp. 530543, July, 1988.  
BibTex  x  
@article{ 10.1109/34.3915, author = {L. Devroye}, title = {Automatic Pattern Recognition: A Study of the Probability of Error}, journal ={IEEE Transactions on Pattern Analysis and Machine Intelligence}, volume = {10}, number = {4}, issn = {01628828}, year = {1988}, pages = {530543}, doi = {http://doi.ieeecomputersociety.org/10.1109/34.3915}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Pattern Analysis and Machine Intelligence TI  Automatic Pattern Recognition: A Study of the Probability of Error IS  4 SN  01628828 SP530 EP543 EPD  530543 A1  L. Devroye, PY  1988 KW  automatic pattern recognition; error statistics; artificial intelligence; probability; training sequence; linear discriminators; nearestneighbor rules; kernelbased rules; histogram rules; binary tree classifiers; Fourier series classifiers; artificial intelligence; computerised pattern recognition; error statistics; probability VL  10 JA  IEEE Transactions on Pattern Analysis and Machine Intelligence ER   
A test sequence is used to select the best rule from a class of discrimination rules defined in terms of the training sequence. The VapnikChervonenkis and related inequalities are used to obtain distributionfree bounds on the difference between the probability of error of the selected rule and the probability of error of the best rule in the given class. The bounds are used to prove the consistency and asymptotic optimality for several popular classes, including linear discriminators, nearestneighbor rules, kernelbased rules, histogram rules, binary tree classifiers, and Fourier series classifiers. In particular, the method can be used to choose the smoothing parameter in kernelbased rules, to choose k in the knearest neighbor rule, and to choose between parametric and nonparametric rules.
[1] M. A. Aizerman, E. M. Braverman, and L. I. Rozonoer, "Theoretical foundations of the potential function method in pattern recognition learning,"Automat. Remote Contr., vol. 25, pp. 917936, 1964.
[2] M. A. Aizerman, E. M. Braverman, and L. I. Rozonoer, "The probability problem of pattern recognition learning and the method of potential functions,"Automat. Remote Contr., vol. 25, pp. 13071323, 1964.
[3] M. A. Aizerman, E. M. Braverman, and L. I. Rozonoer, "The method of potential functions for the problem of restoring the characteristic of a function converter from randomly observed points,"Automat. Remote Contr., vol. 25, pp. 15461556, 1964.
[4] M. A. Aizerman, E. M. Braverman, and L. I. Rozonoer, "Extrapolative problems in automatic control and the method of potential functions,"Amer. Math. Soc. Transl., vol. 87, pp. 281 303, 1970.
[5] K. S. Alexander, "Probability inequalities for empirical processes and a law of the iterated logarithm,"Amer. Prob., vol. 12, pp. 1041 1067, 1984.
[6] A. C. Anderson and K. S. Fu, "Design and development of a linear binary tree classifier for leukocytes," Purdue Univ., Lafayette, IN, Tech. Rep. TREE7931, 1979.
[7] M. W. Anderson and R. D. Benning, "A distributionfree discrimination procedure based on clustering,"IEEE Trans. Inform. Theory, vol. IT16, pp. 541548, 1970.
[8] T. W. Anderson, "Some nonparametric multivariate procedures based on statistically equivalent blocks," inMultivariate Analysis, P. R. Krishnaiah, Ed. New York: Academic, 1966, pp. 527.
[9] P. Argentiero, R. Chin, and P. Beaudet, "An automated approach to the design of decision tree classifiers,"IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI4, pp. 5157, 1982.
[10] L. A. Bartolucci, P. H. Swain, and C. Wu, "Selective radiant temperature mapping using a layered classifier,"IEEE Trans. Geosci. Electron., vol. GE14, pp. 101106, 1976.
[11] O. Bashkirov, E. M. Braverman, and I. E. Muchnik, "Potential function algorithms for pattern recognition learning machines,"Automat. Remote Contr., vol. 25, pp. 692695, 1964.
[12] G. W. Beakley and F. B. Tuteur, "Distributionfree pattern verification using statistically equivalent blocks,"IEEE Trans. Comput., vol. C21, pp. 13371347, 1972.
[13] E. M. Braverman, "The method of potential functions,"Automat. Remote Contr., vol. 26, pp. 21302138, 1965.
[14] E. M. Braverman and E. S. Pyatniskii, "Estimation of the rate of convergence of algorithms based on the potential function method,"Automat. Remote Contr., vol. 27, pp. 80100, 1966.
[15] L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone,Classification and Regression Trees. Belmont, CA: Wadsworth Int., 1984.
[16] T. Cacoullos, "Estimation of a multivariate density,"Ann. Inst. Statist. Math., vol. 18, pp. 179190, 1965.
[17] R. G. Casey and G. Nagy, "Decision tree design using a probabilistic model,"IEEE Trans. Inform. Theory, vol. IT30, pp. 9399, 1984.
[18] N. N. Cencov, "Evaluation of an unknown distribution density from observations,"Sov. Math. Dokl., vol. 3, pp. 15591562, 1962.
[19] J. T. Chu, "Some new error bounds and approximations for pattern recognition,"IEEE Trans. Comput., vol. C23, pp. 194198, 1974.
[20] T. M. Cover and P. E. Hart, "Nearest neighbor pattern classification,"Ann. Math. Statist., vol. 36, pp. 10491051, 1965.
[21] T. M. Cover, "Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition,"IEEE Trans. Electron. Comput., vol. EC14, pp. 326334, 1965.
[22] T. M. Cover, "Learning in pattern recognition," inMethodologies of Pattern Recognition, S. Watanabe, Ed. New York: Academic, 1969, pp. 111132.
[23] T. M. Cover and T. J. Wagner, "Topics in statistical pattern recognition,"Commun. Cybern., vol. 10, pp. 1546, 1975.
[24] S. DasGupta, "Nonparametric classification rules,"Sankhya Ser. A, vol. 26, pp. 2530, 1964.
[25] P. A. Devijver and J. Kittler, "On the edited nearest neighbor rule," inProc. 5th Int. Conf. Pattern Recognition, 1980, pp. 7280.
[26] L. Devroye and T. J. Wagner, "Nonparametric discrimination and density estimation," Electron. Res. Cen., Univ. Texas, Austin, Tech. Rep. 183, 1976.
[27] L. Devroye, "A universalknearest neighbor procedure in discrimination," inProc. 1978 IEEE Comput. Soc. Conf. Pattern Recognition Image Processing, 1978, pp. 142147.
[28] L. Devroye and T. J. Wagner, "Distributionfree performance bounds for potential function rules,"IEEE Trans. Inform. Theory, vol. IT25, pp. 601604, 1979.
[29] L. Devroye and T. J. Wagner, "Distributionfree performance bounds with the resubstitution error estimate,"IEEE Trans. Inform. Theory, vol. IT25, pp. 208 210, 1979.
[30] L. Devroye and T. J. Wagner, "Distributionfree inequalities for the deleted and holdout error estimates,"IEEE Trans. Inform. Theory, vol. IT25, pp. 202 207, 1979.
[31] L. Devroye and T. J. Wagner, "On theL1 convergence of kernel estimators of regression functions with applications in discrimination,"Zeitschrift fur Wahrscheinlichleitstheorie und verwandte Gebiete, vol. 51, pp. 15 25, 1980.
[32] L. Devroye and T. J. Wagner, "Distributionfree consistency results in nonparametric discrimination and regression function estimation,"Ann. Statist., vol. 8, pp. 231239, 1980.
[33] L. Devroye, "Bounds for the uniform deviation of empirical measures,"J. Multivariate Anal., vol. 12, pp. 7279, 1982.
[34] L. Devroye and L. Gyorfi, "Distributionfree exponential bound on theL1error of partitioning estimates of a regression function," inProc. 4th Pannonian Symp. Math. Statist., F. Konecny, J. Mogyorodi, and W. Wertz, Eds. Budapest, Hungary: Akademiai Kiado, 1983, pp. 6776.
[35] L. Devroye and L. Gyorfi,Nonparametric Density Estimation: The Ll View. New York: Wiley, 1985.
[36] R. O. Duda and P. E. Hart,Pattern Classification and Scene Analysis. New York: 1973.
[37] R. M. Dudley, "Central limit theorems for empirical measures,"Ann. Prob., vol. 6, pp. 899929, 1978.
[38] R. M. Dudley, "Empirical processes," inEcole de Probabilite de St. Flour 1982, Lecture Notes Math. 1097. New York: SpringerVerlag, 1984.
[39] B. Efron, "Bootstrap methods: Another look at the jackknife,"Ann. Statist., vol. 7, pp. 126, 1979.
[40] B. Efron, "Estimating the error rate of a prediction rule: Improvement on cross validation,"J. Amer. Statist. Assoc., vol. 78, pp. 316 331, 1983.
[41] L. Feinholz, "Estimation of the performance of partitioning algorithms in pattern classification," M.Sc. thesis, Dep. Math., McGill Univ., Montreal, Canada, 1979.
[42] E. Fix and J. L. Hodges, "Discriminatory analysis, nonparametric discrimination, consistency properties," USAF School of Aviation Med., Randolph Field, TX, Rep. 2149004, 1951.
[43] E. Fix and J. L. Hodges, "Discriminatory analysis: Small sample performance," USAF School of Aviation Med., Randolph Field, TX, Rep. 2149004, 1952.
[44] J. H. Friedman, "A recursive partitioning decision rule for nonparametric classification,"IEEE Trans. Comput., vol. C26, pp. 404 408, 1977.
[45] K. Fukunaga and L. D. Hostetler, "Optimization ofKnearestneighbor density estimates,"IEEE Trans. Inform. Theory, vol. IT 19, pp. 320326, 1973.
[46] K. Fukunaga and T. E. Flick, "An optimal global nearest neighbor metric,"IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI6, pp. 314318, 1984.
[47] P. Gaenssler and W. Stute, "Empirical processes: A survey of results for independent and identically distributed random variables,"Ann. Prob., vol. 7, pp. 193243, 1979.
[48] P. Gaenssler, "Empirical processes," inLecture NotesMonograph Ser.Hayward, CA: Inst. Math. Statist., 1983.
[49] G. W. Gates, "The reduced nearest neighbor rule,"IEEE Trans. Inform. Theory, vol. IT18, pp. 431433, 1972.
[50] M. P. Gessaman, "A consistent nonparametric multivariate density estimator based on statistically equivalent blocks,"Ann. Math. Statist., vol. 41, pp. 13441346, 1970.
[51] M. P. Gessaman and P. H. Gessaman, "A comparison of some multivariate discrimination procedures,"J. Amer. Statist. Ass., vol. 67, pp. 468472, 1972.
[52] E. Gine and J. Zinn, "Some limit theorems for empirical processes,"Ann. Prob., vol. 12, pp. 929989, 1984.
[53] N. Glick, "Samplebased classification procedures derived from density estimators,"J. Amer. Statist. Ass., vol. 67, pp. 116122, 1972.
[54] N. Glick, "Samplebased multinomial classification,"Biometrics, vol. 29, pp. 241256, 1973.
[55] N. Glick, "Samplebased classification procedures related to empiric distributions,"IEEE Trans. Inform. Theory, vol. IT22, pp. 454 461, 1976.
[56] N. Glick, "Additive estimators for probabilities of correct classification,"Pattern Recognition, vol. 10, pp. 211222, 1978.
[57] L. Gordon and R. A. Olshen, "Asymptotically efficient solutions to the classification problem,"Ann. Statist., vol. 6, pp. 515533, 1978.
[58] L. Gordon and R. A. Olshen, "Consistent nonparametric regression from recursive partitioning schemes,"J. Multivariate Anal., vol. 10, pp. 611627, 1980.
[59] R. D. Gordon, "Values of Mills' ratio of area to bounding ordinate and of the normal probability integral for large values of the argument,"Ann. Math. Statist., vol. 12, pp. 364366, 1941.
[60] W. Greblicki, "Asymptotically optimal probabilistic algorithms for pattern recognition and identification," monograph 3, Prace Naukowe Instytutu Cybernetyki Technicznej Politechniki Wroclawsjiej 18, Wroclaw, Poland, 1974.
[61] W. Greblicki, "Pattern recognition procedures with nonparametric density estimates,"IEEE Trans. Syst. Man, Cybern., vol. SMC8, pp. 809812, 1978.
[62] W. Greblicki, "Asymptotically optimal pattern recognition procedures with density estimates,"IEEE Trans. Inform. Theory, vol. IT24, pp. 250251, 1978.
[63] W. Greblicki and M. Pawlak, "Classification using the Fourier series estimate of multivariate density functions,"IEEE Trans. Syst., Man, Cybern., vol. 11, pp. 726730, 1981.
[64] W. Greblicki, "Asymptotic efficiency of classifying procedures using the Hermite series estimate of multivariate probability densities,"IEEE Trans. Inform. Theory, vol. IT27, pp. 364366, 1981.
[65] W. Greblicki and M. Pawlak, "A classification procedure using the multiple Fourier series,"Inform. Sci., vol. 26, pp. 115126, 1982.
[66] W. Greblicki and M. Pawlak, "Almost sure convergence of classification procedures using Hermite series density estimates"Pattern Recognition Lett., vol. 2, pp. 1317, 1983.
[67] W. Greblicki, A. Krzyzak, and M. Pawlak, "Distributionfree pointwise consistency of kernel regression estimate,"Ann. Statist., vol. 12, pp. 15701575, 1984.
[68] W. Greblicki and M. Pawlak, "Pointwise consistency of the Hermite series density estimate,"Statist. Prob. Lett., vol. 3, pp. 65 69, 1985.
[69] D. E. Gustafson, S. Gelfand, and S. K. Mitter, "A nonparametric multiclass partitioning method for classification," inProc. 5th Int. Conf. Pattern Recognition, 1980, pp. 654659.
[70] W. Hardle and J. S. Marron, "Optimal bandwidth selection in nonparametric regression function estimation,"Ann. Statist., vol. 13, pp. 14651481, 1985.
[71] P. E. Hart, "The condensed nearest neighbor rule,"IEEE Trans. Inform. Theory, vol. IT14, pp. 515516, 1968.
[72] E. G. Henrichon and K. S. Fu, "A nonparametric partitioning procedure for pattern classification,"IEEE Trans. Comput., vol. C18, pp. 614624, 1969.
[73] W. Hoeffding, "Probability inequalities for sums of bounded random variables,"J. Amer. Statist. Ass., vol. 58, pp. 1330, 1963.
[74] L. Kanal, "Patterns in pattern recognition 19681974,"IEEE Trans. Inform. Theory, vol. IT20, pp. 697722, 1974.
[75] R. A. Kronmal and M. E. Tarter, "The estimation of probability densities and cumulatives by Fourier series methods,"J. Amer. Statist. Ass., vol. 63, pp. 925952, 1968.
[76] A. Krzyzak, "The rates of convergence of kernel regression estimates and classification rules,"IEEE Trans. Inform. Theory, vol. IT32, pp. 668679, 1986.
[77] A. V. Kulkarni, "On the mean accuracy of hierarchical classifiers,"IEEE Trans. Comput., vol. C27, pp. 771776, 1978.
[78] A. V. Kulkarni and L. N. Kanal, "Admissible search strategies for parametric and nonparametric hierarchical classifiers," inProc. 4th Int. Joint Conf. Pattern Recognition, 1978, pp. 238248.
[79] M. W. Kurzynski, "The optimal strategy of a tree classifier,"Pattern Recognition, vol. 16, pp. 8187, 1983.
[80] P. A. Lachenbruch, "An almost unbiased method of obtaining confidence intervals for the probability of misclassification in discriminant analysis,"Biometrics, vol. 23, pp. 639645, 1967.
[81] P. A. Lachenbruch and M. R. Mickey, "Estimation of error rates in discriminant analysis,"Technometrics, vol. 10, pp. 111, 1968.
[82] P. A. Lachenbruch, C. Sneeringer, and L. T. Revo, "Robustness of the linear and quadratic discriminant functions to certain types of nonnormality,"Commun. Statist., vol. 1, pp. 3956, 1972.
[83] Y. K. Lin and K. S. Fu, "Automatic classification of cervical cells using a binary tree classifier,"Pattern Recognition, vol. 16, pp. 6980, 1983.
[84] T. Lissack and K. S. Fu, "Error estimation in pattern recognition via Ldistance between posterior density functions,"IEEE Trans. Inform. Theory, vol. IT22, pp. 3445, 1976.
[85] A. L. Lunts and V. L. Brailovsky, "Evaluation of attributes obtained in statistical decision rules,"Eng. Cybern., vol. 0, pp. 98 109, 1967.
[86] P. C. Mahalanobis, "A method of fractile graphical analysis,"Sankhya Ser. A, vol. 23, pp. 4164, 1961.
[87] P. Massart, "Vitesse de convergence dans le theoreme de la limite centrale pour le processus empirique," Ph.D. dissertation, Univ. ParisSud, Orsay, France, 1983.
[88] G. J. McLachlan, "The bias of the apparent error rate in discriminant analysis,"Biometrika, vol. 63, pp. 239244, 1976.
[89] W. Meisel, "Potential functions in mathematical pattern recognition,"IEEE Trans. Comput., vol. C18, pp. 911918, 1969.
[90] W. S. Meisel and D. A. Michalopoulos, "A partitioning algorithm with application in pattern classification and the optimization of decision tree,"IEEE Trans. Comput., vol. C22, pp. 93103, 1973.
[91] D. S. Mitrinovic,Analytic Inequalities. New York: SpringerVerlag, 1970.
[92] J. K. Mui and K. S. Fu, "Automated classification of nucleated blood cells using a binary tree classifier,"IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI2, pp. 429443, 1980.
[93] E. A. Nadaraya, "On estimating regression,"Theory Prob. Appl., vol. 9, pp. 141142, 1964.
[94] E. A. Nadaraya, "On nonparametric estimates of density functions and regression curves,"Theory Prob. Appl., vol. 10, pp. 186190, 1965.
[95] E. A. Nadaraya, "Remarks on nonparametric estimates for density functions and regression curves,"Theory Prob. Appl., vol. 15, pp. 134 137, 1970.
[96] R. A. Olshen, "Comments on a paper by C. J. Stone,"Ann. Statist., vol. 5, pp. 632633, 1977.
[97] K. R. Parthasarathy and P. K. Bhattacharya, "Some limit theorems in regression theory,"Sankhya Ser. A, vol. 23, pp. 91102, 1961.
[98] E. Parzen, "On the estimation of a probability density function and the mode,"Ann. Math. Statist., vol. 33, pp. 10651076, 1962.
[99] E. A. Patrick, "Distributionfree minimum conditional risk learning systems," Purdue Univ., Lafayette, IN, Tech. Rep. TREE 6618, 1966.
[100] E. A. Patrick and F. P. Fisher, II, "Introduction to the performance of distributionfree conditional risk learning systems," Purdue Univ., Lafayette, IN, Tech. Rep. TREE6712, 1967.
[101] H. J. Payne and W. S. Meisel, "An algorithm for constructing optimal binary decision trees,"IEEE Trans. Comput., vol. C26, pp. 905916, 1977.
[102] J. Pearl, "Capacity and error estimates for boolean classifiers with limited complexity,"IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI1, pp. 350355, 1979.
[103] N. Pippenger, "Information theory and the complexity of Boolean functions,"Math. Syst. Theory, vol. 10, pp. 124162, 1977.
[104] S. QingYun and K. S. Fu, "A method for the design of binary tree classifiers,"Pattern Recognition, vol. 16, pp. 593603, 1983.
[105] C. P. Quesenberry and M. P. Gessaman, "Nonparametric discrimination using tolerance regions,"Ann. Math. Statist., vol. 39, pp. 664673, 1968.
[106] L. Rejto and P. Revesz, "On empirical density function,"Probl. Contr. Inform. Theory, vol. 2, pp. 6780, 1973.
[107] G. L. Ritter, H. B. Woodruff, S. R. Lowry, and T. L. Isenhour, "An algorithm for a selective nearest neighbor decision rule,"IEEE Trans. Inform. Theory, vol. IT21, pp. 665669, 1975.
[108] M. Rosenblatt, "Remarks on some nonparametric estimates of a density function,"Ann. Math. Statist., vol. 27, pp. 832837, 1956.
[109] E. M. Rounds, "A combined nonparametric approach to feature selection and binary decision tree design,"Pattern Recognition, vol. 12, pp. 313317, 1980.
[110] R. M. Royall, "A class of nonparametric estimators of a smooth regression function," Ph.D. dissertation, Stanford Univ., Stanford, CA, 1966.
[111] S. C. Schwartz, "Estimation of probability density by an orthogonal series,"Ann. Math. Statist., vol. 38, pp. 12611265, 1967.
[112] G. Sebestyen,Decision Making Processes in Pattern Recognition. New York: Macmillan, 1962.
[113] I. K. Sethi and B. Chatterjee, "Efficient decision tree design for discrete variable pattern recognition problems,"Pattern Recognition, vol. 9, pp. 197206, 1977.
[114] I. K. Sethi and G. P. R. Sarvarayudu, "Hierarchical classifier design using mutual information,"IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI4, pp. 441445, 1981.
[115] R. D. Short and K. Fukunaga, "The optimal distance measure for nearest neighbor classification,"IEEE Trans. Inform. Theory, vol. IT27, pp. 622627, 1981.
[116] D. F. Specht, "Series estimation of a probability density function,"Technometrics, vol. 13, pp. 409424, 1971.
[117] C. Spiegelman and J. Sacks, "Consistent window estimation in nonparametric regression,"Ann. Statist., vol. 8, pp. 240246, 1980.
[118] D. S. Stoller, "Univariate twopopulation distributionfree discrimination,"J. Amer. Statist. Ass., vol. 49, pp. 770777, 1954.
[119] C. J. Stone, "Consistent nonparametric regression,"Ann. Statist., vol. 8, pp. 13481360, 1977.
[120] C. J. Stone, "Optimal global rates of convergence for nonparametric regression,"Ann. Statist., vol. 10, pp. 10401053, 1982.
[121] M. Stone, "Crossvalidatory choice and assessment of statistical predictions,"J. Roy. Statist. Soc., vol. 36, pp. 111147, 1974.
[122] P. H. Swain and H. Hauska, "The decision tree classifier: design and potential,"IEEE Trans. Geosc. Electron, vol. GE15, pp. 142147, 1977.
[123] M. E. Tarter and R. A. Kronmal, "On multivariate density estimates based on orthogonal expansions,"Ann. Math. Statist., vol. 41, pp. 718722, 1970.
[124] J. Taylor, P. H. Bartels, M. Bibbo, and G. L. Wied, "Automated hierarchic decision structures for multiple category cell classification by TICAS,"Acta Cytologica, vol. 22, p. 4, 1978.
[125] J. Tomek, "Two modifications of CNN,"IEEE Trans. Syst. Man, Cybern., vol. SMC6, pp. 769772, 1976.
[126] G. T. Toussaint, "Bibliography on estimation of misclassification,"IEEE Trans. Inform. Theory, vol. 20, pp. 472479, 1974.
[127] J. W. Tukey, "Curves as parameters and touch estimation," inProc. 4th Berkeley Symp., 1961, pp. 681694.
[128] J. R. Ullmann, "Automatic selection of reference data for use in a nearestneighbor method of pattern classification,"IEEE Trans. Inform. Theory, vol. IT20, pp. 541543, 1974.
[129] J. VanRyzin, "Bayes risk consistency of classification procedures using density estimation,"Sankhya Ser. A, vol. 28, pp. 161170, 1966.
[130] V. N. Vapnik and A. Ya. Chervonenkis, "Theory of uniform convergence of frequencies of events to their probabilities and problems of search for an optimal solution from empirical data,"Automat. Remote Contr., vol. 32, pp. 207217, 1971.
[131] V. N. Vapnik and A. Ya. Chervonenkis, "On the uniform convergence of relative frequencies of events to their probabilities,"Theory Prob. Appl., vol. 16, pp. 264280, 1971.
[132] V. N. Vapnik and A. Ya. Chervonenkis, "Ordered risk minimization. I,"Automat. Remote Contr., vol. 35, pp. 12261235, 1974.
[133] V. N. Vapnik and A. Ya. Chervonenkis, "Ordered risk minimization. II"Automat. Remote Contr., vol. 35, pp. 14031412, 1974.
[134] V. N. Vapnik and A. Ya. Chervonenkis,Theory of Pattern Recognition. Moscow: Nauka, 1974.
[135] V. N. Vapnik and A. Ya. Chervonenkis, "Necessary and sufficient conditions for the uniform convergence of means to their expectations,"Theory Prob. Appl., vol. 26, pp. 532553, 1981.
[136] V. N. Vapnik,Estimation of Dependences Based on Empirical Data. New York: SpringerVerlag, 1982.
[137] T. J. Wagner, "Convergence of the nearest neighbor rule,"IEEE Trans. Inform. Theory, vol. IT17, pp. 566571, 1971.
[138] T. J. Wagner, "Convergence of the edited nearest neighbor,"IEEE Trans. Inform. Theory, vol. IT19, pp. 696699, 1973.
[139] D. L. Wilson, "Asymptotic properties of nearest neighbor rules using edited data,"IEEE Trans. Syst., Man, Cybern., vol. SMC 2, pp. 408421, 1972.
[140] K. C. You and K. S. Fu, "An approach to the design of a linear binary tree classifier," inProc. Symp. Machine Processing of Remotely Sensed Data, Purdue Univ., Lafayettle, IN, 1976, pp. 3A 10.
[141] J. E. Yukich, "Laws of large numbers for classes of functions,"J. Multivariate Anal., vol. 17, pp. 245260, 1985.