
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
A. Murua, "Upper Bounds for Error Rates of Linear Combinations of Classifiers," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 5, pp. 591602, May, 2002.  
BibTex  x  
@article{ 10.1109/34.1000235, author = {A. Murua}, title = {Upper Bounds for Error Rates of Linear Combinations of Classifiers}, journal ={IEEE Transactions on Pattern Analysis and Machine Intelligence}, volume = {24}, number = {5}, issn = {01628828}, year = {2002}, pages = {591602}, doi = {http://doi.ieeecomputersociety.org/10.1109/34.1000235}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Pattern Analysis and Machine Intelligence TI  Upper Bounds for Error Rates of Linear Combinations of Classifiers IS  5 SN  01628828 SP591 EP602 EPD  591602 A1  A. Murua, PY  2002 KW  Exponential bounds KW  weakly dependent classifiers KW  classification trees KW  machine learning VL  24 JA  IEEE Transactions on Pattern Analysis and Machine Intelligence ER   
A useful notion of weak dependence between many classifiers constructed with the same training data is introduced. It is shown that if both this weak dependence is low and the expected margins are large, then decison rules based on linear combinations of these classifiers can achieve error rates that decrease exponentially fast. Empirical results with randomized trees and trees constructed via boosting and bagging show that weak dependence is present in these type of trees. Furthermore, these results also suggest that there is a tradeoff between weak dependence and expected margins, in the sense that to compensate for low expected margins, there should be low mutual dependence between the classifiers involved in the linear combination.
[1] Y. Amit and D. Geman, “Shape Quantization and Recognition with Randomized Trees,” Neural Computation, vol. 9, pp. 15451588, 1997.
[2] Y. Freund, “Boosting a Weak Learning Algorithm by Majority,” Information and Computation, vol. 121, no. 2, pp. 256285, 1995.
[3] Y. Freund and R. Schapire, “Experiments with a New Boosting Algorithm,” Machine Learning: Proc. 13th Int'l Conf., pp. 148156, 1996.
[4] H. Drucker and C. Cortes, “Boosting Decision Trees,” Advances in Neural Information Processing Systems 8, pp. 479485, 1996.
[5] L. Breiman, “Bagging Predictors,” Machine Learning, vol. 24, pp. 123140, 1996.
[6] L. Breiman, “Bias, Variance and Arcing Classifiers,” Technical Report 460, Dept. of Statistics, Univ. of Calif. at Berkeley, 1996.
[7] L. Breiman, “Using Adaptive Bagging to Debias Regressions,” Technical Report 547, Dept. of Statistics, Univ. of Calif. at Berkeley, 1999.
[8] J.R. Quinlan, “Bagging, Boosting, and C4.5,” Proc. 13th Nat'l Conf. Artificial Intelligence, pp. 725730, 1996.
[9] S. Geman, E. Bienenstock, and R. Doursat, ”Neural Networks and the Bias/Variance Dilemma,” Neural Computation, vol. 4, pp. 1–58, 1992.
[10] R. Tibshirani, “Bias, Variance, and Prediction Error for Classification Rules,” technical report, Dept. of Statistics, Univ. of Toronto, 1996.
[11] Y. Amit and A. Murua, “Speech Recognition Using Randomized Relational Decision Trees,” IEEE Trans. Speech and Audio Processing, 2000.
[12] R.E. Schapire, Y. Freund, P. Bartlett, and W.S. Lee, “Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods,” Proc. 14th Int'l Conf. Machine Learning, 1997.
[13] M. Talagrand, “Sharper Bounds for Gaussian and Empirical Processes,” The Annals of Probability, vol. 22, no. 1, pp. 2876, 1994.
[14] T. Mitchell, Machine Learning, McGrawHill, 1997.
[15] L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification and Regression Trees. Belmont, Calif.: Wadsworth, 1984.
[16] J.H. Friedman and W. Stuetzle, “Projection Pursuit Regression,” J. Am. Statistical Assoc., vol. 76, pp. 817823, 1981.
[17] J. Friedman, T. Hastie, and R. Tibshirani, “Additive Logistic Regression: A Statistical View of Boosting,” Annals of Statistics, 2000.
[18] B.D. Ripley, Pattern Recognition and Neural Networks. Cambridge Univ. Press, 1996.
[19] B.E. Boser, I.M. Guyon, and V.N. Vapnik, "A Training Algorithm for Optimal Margin Classifiers," Proc. Fifth Ann. Workshop Computational Learning Theory, ACM Press, New York, 1992, pp. 144152.
[20] L. Devroye, L. Györfi, and G. Lugosi, A Probabilistic Theory of Pattern Recognition. New York: Springer, 1996.
[21] K.S. Alexander, “Probability Inequalities for Empirical Processes and a Law of the Iterated Logarithm,” Annals of Probability, vol. 12, pp. 10411067, 1984.
[22] A. van der Vaart and J. Wellner, Weak Convergence and Empirical Processes. With Applications to Statistics. SpringerVerlag, 1996.
[23] I. Berkes and W. Philipp, “Approximation Theorems for Independent and Weakly Dependent Random Vectors,” The Annals of Probability, vol. 7, pp. 2954, 1979.
[24] G.G. Roussas and Y.G. Yatracos, “Minimum Distance Estimates with Rates under$\phi\hbox{}{\rm{Mixing}}$,” Festschrift for Lucien Le Cam, D. Pollard, E. Torgersen, and G.L. Yang, eds., pp. 337344, Springer, 1997.
[25] G.G. Roussas and D. Ioannides, “Moment Inequalities for Mixing Sequences of Random Variables,” Stochastic Analysis and Applications, vol. 5, no. 1, pp. 61120, 1987.
[26] Y. Amit Personal communication, 2001
[27] UCI Machine Learning Repository. http://www.ics.uci.edu/mlearnmlrepository.html. 2000
[28] T. Hastie, R. Tibshirani, Generalized Additive Models. Chapman and Hall, 1990.