This Article 
 Bibliographic References 
 Add to: 
Upper Bounds for Error Rates of Linear Combinations of Classifiers
May 2002 (vol. 24 no. 5)
pp. 591-602

A useful notion of weak dependence between many classifiers constructed with the same training data is introduced. It is shown that if both this weak dependence is low and the expected margins are large, then decison rules based on linear combinations of these classifiers can achieve error rates that decrease exponentially fast. Empirical results with randomized trees and trees constructed via boosting and bagging show that weak dependence is present in these type of trees. Furthermore, these results also suggest that there is a trade-off between weak dependence and expected margins, in the sense that to compensate for low expected margins, there should be low mutual dependence between the classifiers involved in the linear combination.

[1] Y. Amit and D. Geman, “Shape Quantization and Recognition with Randomized Trees,” Neural Computation, vol. 9, pp. 1545-1588, 1997.
[2] Y. Freund, “Boosting a Weak Learning Algorithm by Majority,” Information and Computation, vol. 121, no. 2, pp. 256-285, 1995.
[3] Y. Freund and R. Schapire, “Experiments with a New Boosting Algorithm,” Machine Learning: Proc. 13th Int'l Conf., pp. 148-156, 1996.
[4] H. Drucker and C. Cortes, “Boosting Decision Trees,” Advances in Neural Information Processing Systems 8, pp. 479-485, 1996.
[5] L. Breiman, “Bagging Predictors,” Machine Learning, vol. 24, pp. 123-140, 1996.
[6] L. Breiman, “Bias, Variance and Arcing Classifiers,” Technical Report 460, Dept. of Statistics, Univ. of Calif. at Berkeley, 1996.
[7] L. Breiman, “Using Adaptive Bagging to Debias Regressions,” Technical Report 547, Dept. of Statistics, Univ. of Calif. at Berkeley, 1999.
[8] J.R. Quinlan, “Bagging, Boosting, and C4.5,” Proc. 13th Nat'l Conf. Artificial Intelligence, pp. 725-730, 1996.
[9] S. Geman, E. Bienenstock, and R. Doursat, ”Neural Networks and the Bias/Variance Dilemma,” Neural Computation, vol. 4, pp. 1–58, 1992.
[10] R. Tibshirani, “Bias, Variance, and Prediction Error for Classification Rules,” technical report, Dept. of Statistics, Univ. of Toronto, 1996.
[11] Y. Amit and A. Murua, “Speech Recognition Using Randomized Relational Decision Trees,” IEEE Trans. Speech and Audio Processing, 2000.
[12] R.E. Schapire, Y. Freund, P. Bartlett, and W.S. Lee, “Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods,” Proc. 14th Int'l Conf. Machine Learning, 1997.
[13] M. Talagrand, “Sharper Bounds for Gaussian and Empirical Processes,” The Annals of Probability, vol. 22, no. 1, pp. 28-76, 1994.
[14] T. Mitchell, Machine Learning, McGraw-Hill, 1997.
[15] L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification and Regression Trees. Belmont, Calif.: Wadsworth, 1984.
[16] J.H. Friedman and W. Stuetzle, “Projection Pursuit Regression,” J. Am. Statistical Assoc., vol. 76, pp. 817-823, 1981.
[17] J. Friedman, T. Hastie, and R. Tibshirani, “Additive Logistic Regression: A Statistical View of Boosting,” Annals of Statistics, 2000.
[18] B.D. Ripley, Pattern Recognition and Neural Networks. Cambridge Univ. Press, 1996.
[19] B.E. Boser, I.M. Guyon, and V.N. Vapnik, "A Training Algorithm for Optimal Margin Classifiers," Proc. Fifth Ann. Workshop Computational Learning Theory, ACM Press, New York, 1992, pp. 144-152.
[20] L. Devroye, L. Györfi, and G. Lugosi, A Probabilistic Theory of Pattern Recognition. New York: Springer, 1996.
[21] K.S. Alexander, “Probability Inequalities for Empirical Processes and a Law of the Iterated Logarithm,” Annals of Probability, vol. 12, pp. 1041-1067, 1984.
[22] A. van der Vaart and J. Wellner, Weak Convergence and Empirical Processes. With Applications to Statistics. Springer-Verlag, 1996.
[23] I. Berkes and W. Philipp, “Approximation Theorems for Independent and Weakly Dependent Random Vectors,” The Annals of Probability, vol. 7, pp. 29-54, 1979.
[24] G.G. Roussas and Y.G. Yatracos, “Minimum Distance Estimates with Rates under$\phi\hbox{-}{\rm{Mixing}}$,” Festschrift for Lucien Le Cam, D. Pollard, E. Torgersen, and G.L. Yang, eds., pp. 337-344, Springer, 1997.
[25] G.G. Roussas and D. Ioannides, “Moment Inequalities for Mixing Sequences of Random Variables,” Stochastic Analysis and Applications, vol. 5, no. 1, pp. 61-120, 1987.
[26] Y. Amit Personal communication, 2001
[27] UCI Machine Learning Repository. 2000
[28] T. Hastie, R. Tibshirani, Generalized Additive Models. Chapman and Hall, 1990.

Index Terms:
Exponential bounds, weakly dependent classifiers, classification trees, machine learning
A. Murua, "Upper Bounds for Error Rates of Linear Combinations of Classifiers," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 5, pp. 591-602, May 2002, doi:10.1109/34.1000235
Usage of this product signifies your acceptance of the Terms of Use.