Issue No. 11 - November (2009 vol. 31)
Richard Nock , Université Antilles-Guyane, CEREGMIA-UFR Droit et Sciences Economiques, France
Frank Nielsen , Ecole Polytechnique, France
Bartlett et al. (2006) recently proved that a ground condition for surrogates, classification calibration, ties up their consistent minimization to that of the classification risk, and left as an important problem the algorithmic questions about their minimization. In this paper, we address this problem for a wide set which lies at the intersection of classification calibrated surrogates and those of Murata et al. (2004). This set coincides with those satisfying three common assumptions about surrogates. Equivalent expressions for the members—sometimes well known—follow for convex and concave surrogates, frequently used in the induction of linear separators and decision trees. Most notably, they share remarkable algorithmic features: for each of these two types of classifiers, we give a minimization algorithm provably converging to the minimum of any such surrogate. While seemingly different, we show that these algorithms are offshoots of the same “master” algorithm. This provides a new and broad unified account of different popular algorithms, including additive regression with the squared loss, the logistic loss, and the top-down induction performed in CART, C4.5. Moreover, we show that the induction enjoys the most popular boosting features, regardless of the surrogate. Experiments are provided on 40 readily available domains.
Ensemble learning, boosting, Bregman divergences, linear separators, decision trees.
Richard Nock, Frank Nielsen, "Bregman Divergences and Surrogates for Learning", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 31, no. , pp. 2048-2059, November 2009, doi:10.1109/TPAMI.2008.225