The Community for Technology Leaders
RSS Icon
Issue No.03 - March (2010 vol.32)
pp: 569-575
Juan Diego Rodríguez , University of the Basque Country, San Seabstian
Aritz Pérez , University of the Basque Country, San Sebastian-Donostia
Jose Antonio Lozano , University of the Basque Country, San Sebastian-Donostia
In the machine learning field, the performance of a classifier is usually measured in terms of prediction error. In most real-world problems, the error cannot be exactly calculated and it must be estimated. Therefore, it is important to choose an appropriate estimator of the error. This paper analyzes the statistical properties, bias and variance, of the k-fold cross-validation classification error estimator (k-cv). Our main contribution is a novel theoretical decomposition of the variance of the k-cv considering its sources of variance: sensitivity to changes in the training set and sensitivity to changes in the folds. The paper also compares the bias and variance of the estimator for different values of k. The experimental study has been performed in artificial domains because they allow the exact computation of the implied quantities and we can rigorously specify the conditions of experimentation. The experimentation has been performed for two classifiers (naive Bayes and nearest neighbor), different numbers of folds, sample sizes, and training sets coming from assorted probability distributions. We conclude by including some practical recommendation on the use of k-fold cross validation.
k-fold cross validation, prediction error, error estimation, bias and variance, decomposition of the variance, sources of sensitivity, supervised classification.
Juan Diego Rodríguez, Aritz Pérez, Jose Antonio Lozano, "Sensitivity Analysis of k-Fold Cross Validation in Prediction Error Estimation", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.32, no. 3, pp. 569-575, March 2010, doi:10.1109/TPAMI.2009.187
[1] Y. Bengio and Y. Grandvalet, “No Unbiased Estimator of the Variance of K-Fold Cross-Validation,” J. Machine Learning Research, vol. 5, pp. 1089-1105, 2004.
[2] Y. Bengio and Y. Grandvalet, “Bias in Estimating the Variance of K-Fold Cross-Validation,” Statistical Modeling and Analysis for Complex Data Problems, vol. 1, pp. 75-95, 2005.
[3] C.M. Bishop, Pattern Recognition and Machine Learning. Springer, 2006.
[4] U.M. Braga-Neto, R. Hashimoto, E.R. Dougherty, D.V. Nguyen, and R.J. Carroll, “Is Cross-Validation Better than Resubstitution for Ranking Genes?” Bioinformatics, vol. 20, no. 2, pp. 253-258, 2004.
[5] U.M. Braga-Neto and E.R. Dougherty, “Is Cross-Validation Valid for Msmall-Sample Microarray Classification?” Bioinformatics, vol. 20, no. 3, pp.374-380, 2004.
[6] U.M. Braga-Neto, “Small-Sample Error Estimation: Mythology versus Mathematics,” Proc. SPIE, pp. 304-314, 2005.
[7] J. Demsar, “Statistical Comparisons of Classifiers over Multiple Data Sets,” J. Machine Learning Research, vol. 7, pp. 1-30, 2006.
[8] L. Devroye and T. Wagner, “Distribution-Free Performance Bounds with the Resubstitution Error Estimate,” IEEE Trans. Information Theory, vol. 25, no. 2, pp. 208-210, Mar. 1979.
[9] L. Devroye, Non-Parametric Density Estimation. Wiley, 1985.
[10] P. Domingos, “A Unified Bias-Variance Decomposition and Its Applications,” Proc. 17th Int'l Conf. Machine Learning, pp. 231-238, 2000.
[11] R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification, second ed. Wiley Interscience, 2000.
[12] B. Efron and R.J. Tibshirani, “An Introduction to the Bootstrap,” Monographs on Statistics and Applied Probability, vol. 57. Chapman and Hall, 1993.
[13] J.H. Friedman, “On Bias, Variance, 0/1-Loss, and the Curse-of-Dimensionality,” Data Mining and Knowledge Discovery, vol. 1, pp. 55-77, 1997.
[14] J.S. Ide and F.G. Cozman, “Generating Random Bayesian Networks with Constraints on Induces Width,” Proc. European Conf. Artificial Intelligence, 2004.
[15] G.M. James, “Variance and Bias for General Loss Functions,” Machine Learning, vol. 51, pp. 115-135, 2003.
[16] R. Kohavi, “A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection,” Proc. Int'l Joint Conf. Artificial Intelligence, pp. 1137-1145, 1995.
[17] R. Kohavi, “Wrappers for Performance Enhancement and Oblivious Decision Graphs,” PhD thesis, Computer Science Dept., Stanford Univ., 1995.
[18] R. Kohavi and D.H. Wolpert, “Bias Plus Variance Decomposition for Zero-One Loss Functions,” Proc. Int'l Conf. Machine Learning, pp. 275-283, 1996.
[19] P. Langley, W. Iba, and K. Thompson, “An Analysis of Bayesian Classifiers,” Proc. 10th Nat'l Conf. Artificial Intelligence, pp. 223-228, 1992.
[20] P. Lucas, “Restricted Bayesian Network Structure Learning,” Advances in Bayesian Networks (Studies in Fuzziness and Soft Computing), pp. 217-232, Springer, 2004.
[21] G.J. McLachlan, Discriminant Analysis and Statistical Pattern Recognition. John Wiley and Sons, Inc., 1992.
[22] M. Minsky, “Steps Toward Artificial Intelligence,” Trans. IRE, vol. 49, pp. 8-30, 1961.
[23] T. Mitchell, Machine Learning. McGraw-Hill, 1997.
[24] J. Pearl, Probabilistic Reasoning in Intelligence Systems. Morgan-Kaufman, 1988.
[25] M. Sahami, “Learning Limited Dependence Bayesian Classifiers,” Proc. Second Int'l Conf. Knowledge Discovery and Data Mining, pp. 335-338, 1996.
[26] M. Stone, “Cross-Validatory Choice and Assessment of Statistical Predictions,” J. Royal Statistical Soc. Series B, vol. 36, pp. 111-147, 1974.
[27] S.M. Weiss and C.A. Kulikowski, Computer Systems That Learn. Morgan-Kaufmann, 1991.
[28] I.H. Witten and E. Frank, Data mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, 2000.
15 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool