This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
On the Dual Formulation of Boosting Algorithms
December 2010 (vol. 32 no. 12)
pp. 2216-2231
Chunhua Shen, NICTA, Canberra Research Laboratory and Australian National University, Canberra
Hanxi Li, NICTA, Canberra Research Laboratory and Australian National University, Canberra
We study boosting algorithms from a new perspective. We show that the Lagrange dual problems of \ell_1-norm-regularized AdaBoost, LogitBoost, and soft-margin LPBoost with generalized hinge loss are all entropy maximization problems. By looking at the dual problems of these boosting algorithms, we show that the success of boosting algorithms can be understood in terms of maintaining a better margin distribution by maximizing margins and at the same time controlling the margin variance. We also theoretically prove that approximately, \ell_1-norm-regularized AdaBoost maximizes the average margin, instead of the minimum margin. The duality formulation also enables us to develop column-generation-based optimization algorithms, which are totally corrective. We show that they exhibit almost identical classification results to that of standard stagewise additive boosting algorithms but with much faster convergence rates. Therefore, fewer weak classifiers are needed to build the ensemble using our proposed optimization technique.

[1] Y. Freund and R.E. Schapire, "A Decision-Theoretic Generalization of Online Learning and an Application to Boosting," J. Computer and System Sciences, vol. 55, no. 1, pp. 119-139, 1997.
[2] R.E. Schapire, Y. Freund, P. Bartlett, and W.S. Lee, "Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods," Annals of Statistics, vol. 26, no. 5, pp. 1651-1686, 1998.
[3] C. Rudin, I. Daubechies, and R.E. Schapire, "The Dynamics of AdaBoost: Cyclic Behavior and Convergence of Margins," J. Machine Learning Research, vol. 5, pp. 1557-1595, 2004.
[4] C. Rudin, R.E. Schapire, and I. Daubechies, "Analysis of Boosting Algorithms Using the Smooth Margin Function," Annals of Statistics, vol. 35, no. 6, pp. 2723-2768, 2007.
[5] D. Mease and A. Wyner, "Evidence Contrary to the Statistical View of Boosting," J. Machine Learning Research, vol. 9, pp. 131-156, 2008.
[6] J. Friedman, T. Hastie, and R. Tibshirani, "Additive Logistic Regression: A Statistical View of Boosting (with Discussion and a Rejoinder by the Authors)," Annals of Statistics, vol. 28, no. 2, pp. 337-407, 2000.
[7] C. Domingo and O. Watanabe, "MadaBoost: A Modification of AdaBoost," Proc. Ann. Conf. Computational Learning Theory, pp. 180-189, 2000.
[8] G. Rätsch and M.K. Warmuth, "Efficient Margin Maximizing with Boosting," J. Machine Learning Research, vol. 6, pp. 2131-2152, 2005.
[9] A.J. Grove and D. Schuurmans, "Boosting in the Limit: Maximizing the Margin of Learned Ensembles," Proc. Nat'l Conf. Artificial Intelligence, pp. 692-699, 1998.
[10] A. Demiriz, K.P. Bennett, and J. Shawe-Taylor, "Linear Programming Boosting via Column Generation," Machine Learning, vol. 46, nos. 1-3, pp. 225-254, 2002.
[11] L. Breiman, "Prediction Games and Arcing Algorithms," Neural Computation, vol. 11, no. 7, pp. 1493-1517, 1999.
[12] L. Reyzin and R.E. Schapire, "How Boosting the Margin Can Also Boost Classifier Complexity," Proc. Int'l Conf. Machine Learning, 2006.
[13] V. Koltchinskii and D. Panchenko, "Empirical Margin Distributions and Bounding the Generalization Error of Combined Classifiers," Annals of Statistics, vol. 30, no. 1, pp. 1-50, 2002.
[14] G. Rätsch, S. Mika, B. Schölkopf, and K.-R. Müller, "Constructing Boosting Algorithms from SVMs: An Application to One-Class Classification," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 9, pp. 1184-1199, Sept. 2002.
[15] R. Meir and G. Rätsch, "An Introduction to Boosting and Leveraging," Advanced Lectures on Machine Learning, pp. 118-183, Springer-Verlag, 2003.
[16] R.E. Schapire, "The Boosting Approach to Machine Learning: An Overview," Nonlinear Estimation and Classification, pp. 149-172, Springer, 2003.
[17] M. Collins, R.E. Schapire, and Y. Singer, "Logistic Regression, AdaBoost and Bregman Distances," Machine Learning, vol. 48, nos. 1-3, pp. 253-285, 2002.
[18] J. Kivinen and M.K. Warmuth, "Boosting as Entropy Projection," Proc. Ann. Conf. Computational Learning Theory, pp. 134-144, 1999.
[19] G. Lebanon and J. Lafferty, "Boosting and Maximum Likelihood for Exponential Models," Advances in Neural Information Processing Systems, pp. 447-454, MIT Press, 2001.
[20] L. Mason, J. Baxter, P. Bartlett, and M. Frean, "Functional Gradient Techniques for Combining Hypotheses," Advances in Large Margin Classifiers, ch. 12, pp. 221-247, MIT Press, 1999.
[21] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge Univ. Press, 2004.
[22] Y. Freund and R.E. Schapire, "Adaptive Game Playing Using Multiplicative Weights," Games and Economic Behavior, vol. 29, pp. 79-103, 1999.
[23] R.M. Rifkin and R.A. Lippert, "Value Regularization and Fenchel Duality," J. Machine Learning Research, vol. 8, pp. 441-479, 2007.
[24] S. Shalev-Shwartz and Y. Singer, "On the Equivalence of Weak Learnability and Linear Separability: New Relaxations and Efficient Boosting Algorithms," Proc. Ann. Conf. Computational Learning Theory, 2008.
[25] C.-C. Chang and C.-J. Lin, "LIBSVM: A Library for Support Vector Machines," http://www.csie.ntu.edu.tw/cjlin/libsvmtools datasets/, 2001.
[26] P. Bartlett, M. Jordan, and J. McAuliffe, "Convexity, Classification, and Risk Bounds," J. Am. Statistical Assoc., vol. 101, no. 473, pp. 138-156, 2004.
[27] MOSEK ApS "The MOSEK Optimization Toolbox for Matlab Manual, Version 5.0, Revision 93," http:/www.mosek.com/, 2008.
[28] ILOG, Inc., "CPLEX 11.1," http://www.ilog.com/productscplex/, 2008.
[29] C. Tsallis, "Possible Generalization of Boltzmann-Gibbs Statistics," J. Statistical Physics, vol. 52, pp. 479-487, 1988.
[30] O. Kallenberg, Foundations of Modern Probability. Springer-Verlag, 1997.
[31] W. Feller, Introduction to Probability Theory and its Applications, third ed., vol. 1. John Wiley & Sons, 1968.
[32] G. Rätsch, T. Onoda, and K.-R. Müller, "Soft Margins for AdaBoost," Machine Learning, vol. 42, no. 3, pp. 287-320, http://theoval.cmp.uea.ac.uk/gcc/matlabindex.shtml , 2001.
[33] J. Wu, M.D. Mullin, and J.M. Rehg, "Linear Asymmetric Classifier for Cascade Detectors," Proc. Int'l Conf. Machine Learning, pp. 988-995, 2005.
[34] M.E. Lübbecke and J. Desrosiers, "Selected Topics in Column Generation," Operation Research, vol. 53, no. 6, pp. 1007-1023, 2005.
[35] S. Sonnenburg, G. Rätsch, C. Schäfer, and B. Schölkopf, "Large Scale Multiple Kernel Learning," J. Machine Learning Research, vol. 7, pp. 1531-1565, 2006.
[36] J. Sochman and J. Malas, "AdaBoost with Totally Corrective Updates for Fast Face Detection," Proc. IEEE Int'l Conf. Automatic Face and Gesture Recognition, pp. 445-450, 2004.
[37] M.K. Warmuth, J. Liao, and G. Rätsch, "Totally Corrective Boosting Algorithms that Maximize the Margin," Proc. Int'l Conf. Machine Learning, pp. 1001-1008, 2006.
[38] P. Viola and M.J. Jones, "Robust Real-Time Face Detection," Int'l J. Computer Vision, vol. 57, no. 2, pp. 137-154, 2004.
[39] T.G. Dietterich, "Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms," Neural Computation, vol. 10, no. 7, pp. 1895-1923, 1998.
[40] J. Leskovec, "Linear Programming Boosting for Uneven Datasets," Proc. Int'l Conf. Machine Learning, pp. 456-463, 2003.
[41] Y. Freund, "An Adaptive Version of the Boost by Majority Algorithm," Machine Learning, vol. 43, no. 3, pp. 293-318, 2001.

Index Terms:
AdaBoost, LogitBoost, LPBoost, Lagrange duality, linear programming, entropy maximization.
Citation:
Chunhua Shen, Hanxi Li, "On the Dual Formulation of Boosting Algorithms," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 12, pp. 2216-2231, Dec. 2010, doi:10.1109/TPAMI.2010.47
Usage of this product signifies your acceptance of the Terms of Use.