| | This Article | |
| |
| |
| | Share | |
| |
| |
| | Bibliographic References | |
| |
| |
| | Add to: | |
| |
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
| |
| | Search | |
| |
| |
| | |
Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners
March 1991 (vol. 13 no. 3)
pp. 252-264
The effects of sample size on feature selection and error estimation for several types of classifiers are discussed. The focus is on the two-class problem. Classifier design in the context of small design sample size is explored. The estimation of error rates under small test sample size is given. Sample size effects in feature selection are discussed. Recommendations for the choice of learning and test sample sizes are given. In addition to surveying prior work in this area, an emphasis is placed on giving practical advice to designers and users of statistical pattern recognition systems.
[1] 252R. A. Abusev and Y. P. Lumelskij, "Unbiased estimators and classification problems for multivariate normal populations,"Theor. Prob. and Appl., vol. 25, pp. 381-389, 1980 (in Russian).[2] S. A. Aivazian, V. M. Buchstaber, I. S. Yenyukov, and L. D. Meshalkin, "Applied statistics: Classification and reduction of dimensionality,"Finansy i Statistika(Reference Edition), Moscow, 1989 (in Russian).[3] B. G. Batchelor and D. J. Hand, "Pattern recognition competition," inProc. 3rd Int. Conf. Pattern Recognition, Coronado, 1976, pp. 315-321.[4] M. Ben-Bassat, "Use of distance measures, information measures and error bounds in feature evaluation," inHandbook of Statistics, vol. 2, P. R. Krishnaiah and L. N. Kanal, Eds. Amsterdam, The Netherlands: North-Holland, 1982, pp. 773-791.[5] L. Breiman, J. Friedman, R. A. Olsen, and C. J. Stone,Classification and Regression Trees. Belmont, CA: Wadsworth, 1984.[6] Y. D. Broffitt, "Nonparametric classification," inHandbook of Statistics, vol. 2, P. R. Krishnaiah and L. N. Kanal, Eds. Amsterdam, The Netherlands: North-Holland, 1982, pp. 139-168.[7] B. Chandrasekaran and A. K. Jain, "On balancing decision functions,"J. Cybern. Inform. Sci., vol. 2, pp. 12-15, 1979.[8] L. Devroye and T. J. Wagner, "Nearest neighbor methods in discrimination," inHandbook of Statistics, vol. 2, P. R. Krishnaiah and L. N. Kanal, Eds. Amsterdam, The Netherlands: North-Holland, 1982, pp. 193-198.[9] R. O. Duda and P. E. Hart,Pattern Classification and Scene Analysis. New York: Wiley, 1973.[10] B. Efron, "The efficiency of logistic regression compared to normal discriminant analysis,"J. Amer. Statist. Assoc., vol. 70, pp. 892-898, 1975.[11] I. S. Enukov, "A choice of a set of measurements with maximal discriminating power in the case of limited learning sample size," inMultivariate Statistical Analysis in Social-Economic Research. Moscow, USSR: Nauka, 1974, pp. 394-397 (in Russian).[12] D. M. Foley, "Considerations of sample and feature size,"IEEE Trans. Inform. Theory, vol. IT-18, pp. 618-626, 1972.[13] K. Fukunaga, "Statistical pattern recognition," inHandbook of Pattern Recognition and Image Processing, T. Y. Young and K. S. Fu, Eds. New York: Academic, 1986, pp. 3-32.[14] K. Fukunaga and L. D. Hostetler, "Optimization ofK-nearest-neighbor density estimates,"IEEE Trans. Inform. Theory, vol. IT- 19, pp. 320-326, 1973.[15] S. Geiser, "Posterior odds for multivariate normal classifications,"J. Roy. Statist. Soc. B, vol. 21, no. 1, pp. 69-76, 1964.[16] N. Glick, "Additive estimators for probabilities of correct classification,"Pattern Recog., vol. 10, no. 3, pp. 211-222, 1978.[17] M. Goldstein and W. R. Dillon,Discrete Discriminant Analysis. New York: Wiley, 1978.[18] V. Grabauskas, Inst. Math. Cybern., Acad. Sci., Lithuania, personal communication, 1983.[19] D. Griskevicius and S. Raudys, "On the expected probability of the classification error of the classifier for discrete variables," inStatistical Problems of Control, issue 38, S. Raudys, Ed. Vilnius, USSR: Inst. Math. Cybern. Press, 1979, pp. 95-112 (in Russian).[20] D. J. Hand, "Recent advances in error rate estimation,"Pattern Recog. Lett., vol. 5, pp. 335-346, 1986.[21] A. K. Jain, R. C. Dubes, and C. C. Chen, "Bootstrap techniques for error estimation,"IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI-9, no. 9, pp. 628-636, 1987.[22] A. K. Jain and B. Chandrasekaran, "Dimensionality and sample size considerations in pattern recognition practice," inHandbook of Statistics, vol. 2, P. R. Krishnaiah and L. N. Kanal, Eds. Amsterdam, The Netherlands, North-Holland, 1982, pp. 835-855.[23] A. K. Jain and M. D. Ramaswami, "Classifier design with Parzen windows," inPattern Recognition and Artificial Intelligence, E. S. Gelsema and L. N. Kanal, Eds. Amsterdam, The Netherlands: Elsevier, 1988, pp. 211-228.[24] A. K. Jain and W. G. Waller, "On the optimal number of features in the classification of multivariate Gaussian data,"Pattern Recog., vol. 10, pp. 365-374, 1978.[25] L. Kanal, "Patterns in pattern recognition 1968-1974,"IEEE Trans. Inform. Theory, vol. IT-20, pp. 697-722, 1974.[26] L. Kanal and B. Chandrasekaran, "On dimensionality and sample size in statistical pattern classification,"Pattern Recog., vol. 3, pp. 238-255, 1971.[27] D. G. Keehn, "A note on learning for Gaussian properties,"IEEE Trans. Inform. Theory, vol. IT-11, no. 1, pp. 126-131, 1965.[28] J. Kittler, "Feature selection and extraction," inHandbook of Pattern Recognition and Image Processing, T. Y. Young and K. S. Fu, Eds. New York: Academic, 1986, pp. 60-83.[29] P. A. Lachenbruch and R. M. Mickey, "Estimation of error rates in discriminant analysis,"Technometrics, vol. 10, no. 1, pp. 1-11, 1968.[30] P. A. Lachenbruch, C. Sneeringer, and L. T. Revo, "Robustness of the linear and quadratic discriminant functions to certain types of non-normality,"Commun. Statist., vol. 1, no. 1, pp. 39-56, 1972.[31] G. S. Lbov, "Logical functions in the problems of empirical prediction," inHandbook of Statistics, vol. 2, P. R. Krishnaiah and L. N. Kanal, Eds. Amsterdam, The Netherlands: North-Holland, 1982, pp. 479-491.[32] T. Lissack and K. S. Fu, "Error estimation in pattern recognition via L-distance between posterior density functions,"IEEE Trans. Inform. Theory, vol. IT-22, pp. 34-45, 1976.[33] G. J. McLachlan, "The bias of the apparent error rate in discriminant analysis,"Biometrika, vol. 63, pp. 239-244, 1976.[34] G. J. McLachlan, "Assessing the performance of an allocation rule,"Comput. Math. Applicat., vol. 12A, pp. 261-272, 1976.[35] G. J. McLachlan, "The efficiency of Efron's 'bootstrap' approach to error estimation in discriminant analysis,"J. Stat. Comput. Simulation, vol. 11, pp. 273-279, 1980.[36] G. J. McLachlan, "Error rate estimation in discriminant analysis: Recent advances," inAdvances in Multivariate Statistical Analysis, A. K. Gupta, Ed. Dordrect, The Netherlands: Reidel, 1987, pp. 233-252.[37] L. Miroshnichenko, "Comparison of algorithms for selecting the best feature set in pattern recognition," inStatistical Problems of Control, issue 93. Vilnius, USSR: Inst. Math. Cybern. Press, 1990, pp. 78-91 (in Russian).[38] T. Y. O'Neill, "The general dtstribution of the error rate of a classification procedure with application to logistic regression discrimination,"J. Amer. Statist. Assoc., vol. 75, pp. 154-160, 1980.[39] K. W. Pettis, T. A. Bailey, A. K. Jain, and R. C. Dubes, "An intrinsic dimensionality estimator from near-neighbor information,"IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI-1, no. 1, pp. 25-37, 1979.[40] V. Pikelis, "Analysis of learning speed of three linear classifiers," Ph.D. dissertation, Inst. Phys. Math., Vilnius, pp. 1-136, 1974 (in Russian).[41] S. Raudys, "On the problems of sample size in pattern recognition," inProc. 2nd All-Union Conf. Statistical Methods in Control Theory, Moscow, USSR: Nauka, 1970, pp. 64-67 (in Russian).[42] S. Raudys, V. Pikelis, and K. Juskevicius, "Experimental comparison of thirteen classification algorithms," inStatistical Problems of Control, issue 11, Vilnius, USSR: Inst. Phys. Math. Press, 1975, pp. 35-80 (in Russian).[43] S. Raudys, "Comparison of the estimates of the probability of misclassification," inProc. 4th Int. Conf. Pattern Recognition, Kyoto, Japan, Nov. 1978, pp. 280-282.[44] S. Raudys, "Determination of optimal dimensionality in statistical pattern classification,"Pattern Recog., vol. 11, pp. 263-270, 1979.[45] S. Raudys and V. Pikelis, "On dimensionality, sample size, classification error, and complexity of classification algorithm in pattern recognition,"IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI- 2, no. 3, pp. 242-252, 1980.[46] S. Raudys, "The influence of sample size on classification performance," inStatistical Problems of Control, issue 66. Vilnius, USSR, Inst. Math. Cybern. Press, 1984, pp. 9-42 (in Russian).[47] S. Raudys and V. Vaitukaitis, "Methods to estimate the probability of misclassification," inStatistical Problems of Control, issue 66. Vilnius, USSR: Inst. Math. Cybern. Press, 1984, pp. 43-65 (in Russian).[48] S. Raudys, "On the accuracy of a bootstrap estimate of the classification error," inProc. 9th Int. Conf. Pattern Recognition, Rome, Italy, Nov. 1988, p. 1230-1232.[49] S. Raudys, V. Pikelis, an D. Stasaitis, "The effects of the number of initial and final features, the dependence between the features and the type of a classification rule on the accuracy of feature selection,"Pattern Recog. Artificial Intell., 1990, submitted for publication.[50] J. W. Sayre, "The distribution of actual error rates in linear discriminant analysis,"J. Amer. Statist. Assoc., vol. 75, pp. 201-205, 1980.[51] I. K. Sethi and G. P. R. Sarvarayudu, "Hierarchical classifier design using mutual information,"IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI-4, pp. 441-445, 1982.[52] M. Siotani, "Large sample approximations and asymptotic expansions of classification statistics," inHandbook of Statistics, vol. 2, P. R. Krishnaiah and L. N. Kanal, Eds. Amsterdam, The Netherlands: North-Holland, 1982, pp. 61-100.[53] M. Skurikhina, "Effect of the kernel form on the quality of nonparametric Parzen window classifier," inStatistical Problems of Control, issue 93. Vilnius, USSR: Inst. Math. Cybern. Press. 1990 (in Russian).[54] G. T. Toussaint, "Bibliography on estimation of misclassification,"IEEE Trans. Inform. Theory, vol. 20, pp. 472-479, 1974.[55] N. Vanichsetakul, "Tree structured classification via recursive discriminant analysis," Ph.D. dissertation, Univ. Wisconsin, 1986.[56] V. N. Vapnik,Recovery of Dependencies from Empirical Data. New York: Springer-Verlag, 1982.[57] C. T. Wolverton and T. J. Wagner, "Asymptotically optimal discriminant functions for pattern classification,"IEEE Trans. Inform. Theory, vol. IT-15, no. 2, pp. 258-265, 1969.[58] D. Zvirenaite, "Criteria for selecting the informative features in pattern recognition," inStatistical Problems of Control, issue 74. Vilnius, USSR: Inst. Math. Cybern. Press, 1986, pp. 76-103 (in Russian).
Index Terms:
sample size effects; statistical pattern recognition; feature selection; error estimation; classifiers; error rates; learning; pattern recognition; statistical analysis
Citation:
S.J. Raudys, A.K. Jain, "Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 13, no. 3, pp. 252-264, Mar. 1991, doi:10.1109/34.75512