loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
A General Model for Finite-Sample Effects in Training and Testing of Competing Classifiers
December 2003 (vol. 25 no. 12)
pp. 1561-1569

Abstract—The conventional wisdom in the field of statistical pattern recognition (SPR) is that the size of the finite test sample dominates the variance in the assessment of the performance of a classical or neural classifier. The present work shows that this result has only narrow applicability. In particular, when competing algorithms are compared, the finite training sample more commonly dominates this uncertainty. This general problem in SPR is analyzed using a formal structure recently developed for multivariate random-effects receiver operating characteristic (ROC) analysis. Monte Carlo trials within the general model are used to explore the detailed statistical structure of several representative problems in the subfield of computer-aided diagnosis in medicine. The scaling laws between variance of accuracy measures and number of training samples and number of test samples are investigated and found to be comparable to those discussed in the classic text of Fukunaga, but important interaction terms have been neglected by previous authors. Finally, the importance of the contribution of finite trainers to the uncertainties argues for some form of bootstrap analysis to sample that uncertainty. The leading contemporary candidate is an extension of the 0.632 bootstrap and associated error analysis, as opposed to the more commonly used cross-validation.

[1] K. Fukunaga, Introduction to Statistical Pattern Recognition, second edition. Academic Press, 1990.
[2] K. Fukunaga and R.R. Hayes, "Effects of Sample Size in Classifier Design," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 11, no. 8, pp. 873-885, Aug. 1989.
[3] K. Fukunaga and R.R. Hayes, Estimation of Classifier Performance IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 11, no. 10, pp. 1087-1101, Oct. 1989.
[4] S.J. Raudys and A.K. Jain, "Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 13, pp. 252-264, 1991.
[5] A.K. Jain, R.P.W. Duin, and J. Mao, Statistical Pattern Recognition: A Review IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 1, pp. 4-37, Jan. 2000.
[6] S. Raudys, Statistical and Neural Classifiers: An Integrated Approach to Design, p. 312, London: Springer, 2001.
[7] H.-P. Chan, B. Sahiner, R.F. Wagner, and N. Petrick, Classifier Design for Computer-Aided Diagnosis: Effects of Finite Sample Size on the Mean Performance of Classical and Neural Network Classifiers Medical Physics, vol. 26, no. 12, pp. 2654-2668, 1999.
[8] D.D. Dorfman, K.S. Berbaum, and C.E. Metz, Receiver Operating Characteristic Rating Analysis: Generalization to the Population of Readers and Patients with the Jackknife Method Investigative Radiology, vol. 27, pp. 723-731, 1992.
[9] S.V. Beiden, R.F. Wagner, and G. Campbell, Components-of-Variance Models and Multiple-Bootstrap Experiments: An Alternative Method for Random-Effects Receiver Operating Characteristic Analysis Academic Radiology, vol. 7, pp. 341-349, 2000.
[10] S.V. Beiden, R.F. Wagner, G. Campbell, C.E. Metz, and Y. Jiang, Components-of-Variance Models for Random-Effects ROC Analysis: The Case of Unequal Variance Structures across Modalities Academic Radiology, vol. 8, pp. 605-615, 2001.
[11] S.V. Beiden, R. F. Wagner, G. Campbell, and H.-P. Chan, Analysis of Uncertainties of Estimates of Variance Components in Multivariate ROC Analysis Academic Radiology, vol. 8, pp. 616-622, 2001.
[12] C.E. Metz, ROC Methodology in Radiologic Imaging Investigative Radiology, vol. 21, pp. 720-733, 1986.
[13] C.E. Metz, Some Practical Issues of Experimental Design and Data Analysis in Radiological ROC Studies Investigative Radiology, vol. 24, pp. 234-245, 1989.
[14] Y. Jiang, C.E. Metz, and R.M. Nishikawa, A Receiver Operating Characteristic Partial Area Index for Highly Sensitive Diagnostic Tests Radiology, vol. 201, pp. 745-750, 1996.
[15] C.E. Metz, Statistical Analysis of ROC Data in Evaluating Diagnostic Performance Multiple Regression Analysis: Applications in the Health Sciences, D. Herbert and R. Meyers, eds. New York: Am. Inst. of Physics: Am. Assoc. of Physicists in Medicine, pp. 365-384, 1986.
[16] C.E. Metz, Y. Jiang, H. MacMahon, R.M. Nishikawa, and X. Pan, ROC Software Kurt Rossmann Laboratories for Radiologic Image Research, Univ. of Chicago,www.cs.yale.edu/users/kannan/Papers/cluster.pshttp:/ /www.radiology.uchicago.edu/ krlroc_soft.htm, 2003.
[17] C.A. Roe and C.E. Metz, Variance-Component Modeling in the Analysis of Receiver Operating Characteristic Index Estimates Academic Radiology, vol. 4, pp. 587-600, 1997.
[18] B. Efron and R.J. Tibshirani, An Introduction to the Bootstrap. New York: Chapman and Hall, 1993.
[19] T. Hastie, R.J. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York: Springer-Verlag, 2001.
[20] M.A. Maloof, S.V. Beiden, and R.F. Wagner, Analysis of Competing Classifiers in Terms of Components of Variance of ROC Accuracy Measures Technical Report CS-02-01, Dept. of Computer Science, Georgetown Univ., Washington, D.C.,http://www.cs.georgetown.edu/maloof/pubs cstr-02-01.pdf, Jan. 2002.
[21] L.P. Clarke, B.Y. Croft, E. Staab, H. Baker, and D.C. Sullivan, National Cancer Institute Initiative: Lung Image Database Resource for Imaging Research Academic Radiology, vol. 8, pp. 447-450, 2001.
[22] L.E. Dodd et al., An Overview of Assessment Methodologies and Related Statistical Issues for Computer-Assist Modalities in Lung Imaging: A Status Report of the Lung Image Database Consortium (LIDC) Academic Radiology, to be submitted, July 2003.
[23] A.P. Bradley, The Use of the Area under the ROC Curve in the Evaluation of Machine Learning Algorithms Pattern Recognition, vol. 30, no. 7, pp. 1145-1159, 1997.
[24] B. Efron, The Jackknife, the Bootstrap, and Other Resampling Plans. Philadelphia: SIAM, 1982.
[25] B. Efron, Estimating the Error Rate of an Prediction Rule: Improvement on Cross-Validation J. Am. Statistical Assoc., vol. 78, no. 382, pp. 316-331, 1983.
[26] G.J. McLachlan, Discriminant Analysis and Statistical Pattern Recognition. New York: John Wiley&Sons, 1992.
[27] B.D. Ripley, Pattern Recognition and Neural Networks. Cambridge, U.K.: Cambridge Univ. Press, 1996.
[28] B. Efron and R.J. Tibshirani, Improvements on Cross-Validation: The .632+ Bootstrap Method J. Am. Statistical Assoc.: Theory and Methods, vol. 92, no. 438, pp. 548-560, 1997.
[29] R.F. Wagner, H.-P. Chan, B. Sahiner, N. Petrick, and J.T. Mossoba, Components of Variance in ROC Analysis of CADx Classifier Performance. II: Applications of the Bootstrap Proc. SPIE, vol. 3661, pp. 523-532, 1999.

Index Terms:
Pattern recognition, classifier design and evaluation, discriminant analysis, ROC analysis, components-of-variance models, bootstrap methods.
Citation:
Sergey V. Beiden, Marcus A. Maloof, Robert F. Wagner, "A General Model for Finite-Sample Effects in Training and Testing of Competing Classifiers," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 12, pp. 1561-1569, Dec. 2003, doi:10.1109/TPAMI.2003.1251149
Usage of this product signifies your acceptance of the Terms of Use.