This Article 
 Bibliographic References 
 Add to: 
A Theoretical Analysis of Bagging as a Linear Combination of Classifiers
July 2008 (vol. 30 no. 7)
pp. 1293-1299
We apply an analytical framework for the analysis of linearly combined classifiers to ensembles generated by bagging. This provides an analytical model of bagging misclassification probability as a function of the ensemble size, which is a novel result in the literature. Experimental results on real data sets confirm the theoretical predictions. This allows us to derive a novel and theoretically grounded guideline for choosing bagging ensemble size. Furthermore, our results are consistent with explanations of bagging in terms of classifier instability and variance reduction, support the optimality of the simple average over the weighted average combining rule for ensembles generated by bagging, and apply to other randomization-based methods for constructing classifier ensembles. Although our results do not allow to compare bagging misclassification probability with the one of an individual classifier trained on the \textit{original} training set, we discuss how the considered theoretical framework could be exploited to this aim.

[1] R.E. Banfield, L.O. Hall, and K.W. Bowyer, “A Comparison of Decision Tree Ensemble Creation Techniques,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, pp. 173-180, 2007.
[2] E. Bauer and R. Kohavi, “An Empirical Comparison of Voting Classification Algorithms: Bagging, Voting, and Variants,” Machine Learning, vol. 36, pp.105-139, 1999.
[3] L. Breiman, “Bagging Predictors,” Machine Learning, vol. 24, pp. 123-140, 1996.
[4] L. Breiman, “Random Forests,” Machine Learning, vol. 45, pp. 5-32, 2001.
[5] N. Chawla, T.E. Moore Jr., and K.W. Bowyer, “Bagging Is a Small-Data-Set Phenomenon,” Proc. Int'l Conf. Computer Vision and Pattern Recognition, pp.684-689, 2001.
[6] T.G. Dietterich, “An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting and Randomization,” Machine Learning, vol. 40, pp. 1-22, 1999.
[7] P. Domingos, “Why Does Bagging Work? A Bayesian Account and Its Implications,” Proc. Third Int'l Conf. Knowledge Discovery and Data Mining, pp. 155-158, 1997.
[8] G. Fumera and F. Roli, “A Theoretical and Experimental Analysis of Linear Combiners for Multiple Classifier Systems,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, pp. 942-956, 2005.
[9] G. Fumera, F. Roli, and A. Serrau, “Dynamics of Variance Reduction in Bagging and Other Techniques Based on Randomisation,” Proc. Int'l Workshop Multiple Classifier Systems, vol. 3541, pp. 316-325, 2005.
[10] Y. Grandvalet, “Bagging Equalizes Influence,” Machine Learning, vol. 55, pp.251-270, 2004.
[11] S. Guenter and H. Bunke, “Multiple Classifier Systems in Offline Handwritten Word Recognition—on the Influence of Training Set and Vocabulary Size,” Int'l J. Pattern Recognition Artifical Intelligence, vol. 18, pp. 1303-1320, 2004.
[12] T.K. Ho, “The Random Subspace Method for Constructing Decision Forests,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, pp.832-844, 1998.
[13] L.I. Kuncheva, Combining Pattern Classifiers: Methods and Algorithms. Wiley, 2004.
[14] P. Latinne, O. Debeir, and C. Decaestecker, “Limiting the Number of Trees in Random Forests,” Proc. Int'l Workshop Multiple Classifier Systems, vol. 2096, pp. 178-187, 2001.
[15] G. Martínez-Muñoz and A. Suárez, “Using Boosting to Prune Bagging Ensembles,” Pattern Recognition Letters, vol. 28, no. 1, pp. 156-165, Jan. 2007.
[16] M. Skurichina and R.P.W. Duin, “Bagging for Linear Classifiers,” Pattern Recognition, vol. 31, pp. 909-930, 1998.
[17] K. Tumer, “Linear and Order Statistics Combiners for Reliable Pattern Classification,” PhD dissertation, Univ. of Texas, Austin, 1996.
[18] K. Tumer and J. Ghosh, “Analysis of Decision Boundaries in Linearly Combined Neural Classifiers,” Pattern Recognition, vol. 29, pp. 341-348, 1996.
[19] K. Tumer and J. Ghosh, “Linear and Order Statistics Combiners for Pattern Classification,” Combining Artificial Neural Nets, A.J.C. Sharkey, ed., pp. 127-155, Springer, 1999.
[20] G. Valentini, “An Experimental Bias-Variance Analysis of SVM Ensembles Based on Resampling Techniques,” IEEE Trans. Systems, Man, and Cybernetics, Part B, vol. 35, pp. 1252-1271, 2005.

Index Terms:
Multiple Classifier Systems, Bagging, Linear Combiners, Classifier Fusion, pattern classification.
Giorgio Fumera, Roli Fabio, Serrau Alessandra, "A Theoretical Analysis of Bagging as a Linear Combination of Classifiers," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 7, pp. 1293-1299, July 2008, doi:10.1109/TPAMI.2008.30
Usage of this product signifies your acceptance of the Terms of Use.