This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Cost-Sensitive Boosting
February 2011 (vol. 33 no. 2)
pp. 294-309
Hamed Masnadi-Shirazi, University of California at San Diego, La Jolla
Nuno Vasconcelos, University of California at San Diego, La Jolla
A novel framework is proposed for the design of cost-sensitive boosting algorithms. The framework is based on the identification of two necessary conditions for optimal cost-sensitive learning that 1) expected losses must be minimized by optimal cost-sensitive decision rules and 2) empirical loss minimization must emphasize the neighborhood of the target cost-sensitive boundary. It is shown that these conditions enable the derivation of cost-sensitive losses that can be minimized by gradient descent, in the functional space of convex combinations of weak learners, to produce novel boosting algorithms. The proposed framework is applied to the derivation of cost-sensitive extensions of AdaBoost, RealBoost, and LogitBoost. Experimental evidence, with a synthetic problem, standard data sets, and the computer vision problems of face and car detection, is presented in support of the cost-sensitive optimality of the new algorithms. Their performance is also compared to those of various previous cost-sensitive boosting proposals, as well as the popular combination of large-margin classifiers and probability calibration. Cost-sensitive boosting is shown to consistently outperform all other methods.

[1] S. Viaene, R.A. Derrig, and G. Dedene, "Cost-Sensitive Learning and Decision Making for Massachusetts Pip Claim Fraud Data," Int'l J. Intelligent Systems, vol. 19, pp. 1197-1215, 2004.
[2] A. Vlahou, J.O. Schorge, B.W. Gregory, and R.L. Coleman, "Diagnosis of Ovarian Cancer Using Decision Tree Classification of Mass Spectral Data," J. Biomedicine and Biotechnology, vol. 2003, no. 5, pp. 308-314, 2003.
[3] P. Viola and M. Jones, "Fast and Robust Classification Using Asymmetric Adaboost and a Detector Cascade," Advances in Neural Information Processing System, vol. 2, pp. 1311-1318, MIT Press, 2002.
[4] M. Turk and A. Pentland, "Eigenfaces for Recognition," J. Cognitive Neuroscience, vol. 3, pp. 71-86, 1991.
[5] K. Sung and T. Poggio, "Example Based Learning for View-Based Human Face Detection," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 1, pp. 39-51, Jan. 1998.
[6] H.A. Rowley, S. Baluja, and T. Kanade, "Neural Network-Based Face Detection," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 1, pp. 23-38, Jan. 1998.
[7] M. Oren, C. Papageorgiou, P. Sinha, E. Osuna, and T. Poggio, "Pedestrian Detection Using Wavelet Templates," Proc. IEEE Conf. Pattern Recognition and Computer Vision, 1997.
[8] H. Schneiderman and T. Kanade, "Object Detection Using the Statistics of Parts," Int'l J. Computer Vision, vol. 56, no. 3, pp. 151-177, 2004.
[9] Y. Amit and D. Geman, "Shape Quantization and Recognition with Randomized Trees," Neural Computation, vol. 9, pp. 1545-1588, 1997.
[10] D. Roth, M. Yang, and N. Ahuja, "Learning to Recognize Three-Dimensional Objects," Neural Computation, vol. 14, pp. 1071-1103, 2002.
[11] C. Elkan, "The Foundations of Cost-Sensitive Learning," Proc. 17th Int'l Joint Conf. Artificial Intelligence, pp. 973-978, 2001.
[12] B. Zadrozny and C. Elkan, "Learning and Making Decisions When Costs and Probabilities Are Both Unknown," Proc. Seventh Int'l Conf. Knowledge Discovery and Data Mining, pp. 203-213, 2001.
[13] P. Domingos, "Metacost: A General Method for Making Classifiers Cost-Sensitive," Proc. Int'l Conf. Knowledge Discovery and Data Mining, pp. 155-164, 1999.
[14] A. Wald, "Contributions to the Theory of Statistical Estimation and Testing Hypotheses," The Annals of Math. Statistics, vol. 10, pp. 299-326, 1939.
[15] R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification. John Wiley & Sons, Inc., 2001.
[16] Y. Freund and R. Schapire, "A Decision-Theoretic Generalization of Online Learning and an Application to Boosting," J. Computer and System Sciences, vol. 55, no. 1, pp. 119-139, 1997.
[17] L. Breiman, "Arcing Classifiers," The Annals of Statistics, vol. 26, no. 3, pp. 801-849, 1998.
[18] J. Friedman, T. Hastie, and R. Tibshirani, "Additive Logistic Regression: A Statistical View of Boosting," The Annals of Statistics, vol. 38, pp. 337-374, 2000.
[19] D. Mease, A.J. Wyner, and A. Buja, "Boosted Classification Trees and Class Probability/Quantile Estimation," J. Machine Learning Research, vol. 8, pp. 409-439, 2007.
[20] D. Mease and A.J. Wyner, "Evidence Contrary to the Statistical View of Boosting," J. Machine Learning Research, vol. 9, pp. 131-156, 2008.
[21] A. Niculescu-Mizil and R. Caruana, "Obtaining Calibrated Probabilities from Boosting," Proc. 21st Conf. Uncertainty in Artificial Intelligence, pp. 413-420, 2005.
[22] W. Jiang, "Process Consistency for Adaboost," The Annals of Statistics, vol. 32, pp. 13-29, 2004.
[23] R.E. Schapire and Y. Singer, "Improved Boosting Using Confidence-Rated Predictions," Machine Learning, vol. 37, no. 3, pp. 297-336, 1999.
[24] W. Fan, S. Stolfo, J. Zhang, and P. Chan, "Adacost: Misclassification Cost-Sensitive Boosting," Proc. Sixth Int'l Conf. Machine Learning, pp. 97-105, 1999.
[25] K.M. Ting, "A Comparative Study of Cost-Sensitive Boosting Algorithms," Proc. 17th Int'l Conf. Machine Learning, pp. 983-990, 2000.
[26] Y. Sun, A.K.C. Wong, and Y. Wang, "Parameter Inference of Cost-Sensitive Boosting Algorithms," Proc. Fourth Int'l Conf. Machine Learning and Data Mining in Pattern Recognition, pp. 21-30, 2005.
[27] D. Newman, S. Hettich, C. Blake, and C. Merz, "UCI Repository of Machine Learning Databases," http://www.ics.uci.edu/~mlearnMLRepository.html , 1998.
[28] P.A. Viola and M.J. Jones, "Robust Real-Time Face Detection," Int'l J. Computer Vision, vol. 57, no. 2, pp. 137-154, 2004.
[29] S. Agarwal, A. Awan, and D. Roth, "Learning to Detect Objects in Images via a Sparse, Part-Based Representation," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 11, pp. 1475-1490, Nov. 2004.
[30] J. Neyman and E.S. Pearson, "On the Problem of the Most Efficient Tests of Statistical Hypotheses," Philosophical Trans. Royal Soc. London, vol. 231, pp. 289-337, 1933.
[31] H.L.V. Tree, Detection, Estimation and Modulation Theory. John Wiley & Sons, Inc., 1968.
[32] D. Green and J. Swets, Signal Detection Theory and Psychophysics. John Wiley & Sons, Inc., 1966.
[33] S. Agarwal, T. Graepel, R. Herbrich, S. Har-Peled, and D. Roth, "Generalization Bounds for the Area under the ROC Curve," J. Machine Learning Research, vol. 6, pp. 393-425, 2005.
[34] V.N. Vapnik, Statistical Learning Theory. John Wiley & Sons, Inc., 1998.
[35] R.E. Schapire, "The Strength of Weak Learnability," Machine Learning, vol. 5, pp. 197-227, 1990.
[36] L. Mason, J. Baxter, P. Bartlett, and M. Frean, "Boosting Algorithms as Gradient Descent," Advances in Neural Information Processing Systems, pp. 512-518, MIT Press, 2000.
[37] J.H. Friedman, "Greedy Function Approximation: A Gradient Boosting Machine," The Annals of Statistics, vol. 29, no. 5, pp. 1189-1232, 2001.
[38] R.S. Zemel and T. Pitassi, "A Gradient-Based Boosting Algorithm for Regression Problems," Advances in Neural Information Processing Systems, pp. 696-702, MIT Press, 2000.
[39] Y. Freund and R.E. Schapire, "Experiments with a New Boosting Algorithm," Proc. Int'l Conf. Machine Learning, pp. 148-156, 1996.
[40] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning. Springer-Verlag, Inc., 2001.
[41] J. Platt, "Probabilistic Outputs for Support Vector Machines and Comparison to Regularized Likelihood Methods," Advances in Large Margin Classifiers, pp. 61-74, MIT Press, 2000.
[42] B. Zadrozny and C. Elkan, "Obtaining Calibrated Probability Estimates from Decision Trees and Naive Bayesian Classifiers," Proc. 18th Int'l Conf. Machine Learning, pp. 609-616, 2001.
[43] H.-T. Lin, C.-J. Lin, and R.C. Weng, "A Note on Platt's Probabilistic Outputs for Support Vector Machines," Machine Learning, vol. 68, no. 3, pp. 267-276, 2007.
[44] J. Demšar, "Statistical Comparisons of Classifiers over Multiple Data Sets," J. Machine Learning Research, vol. 7, pp. 1-30, 2006.
[45] L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification and Regression Trees. Wadsworth, 1984.
[46] L. Breiman, "Random Forests," Machine Learning, vol. 45, no. 1, pp. 5-32, 2001.
[47] J. Mutch and D.G. Lowe, "Object Class Recognition and Localization Using Sparse Features with Limited Receptive Fields," Int'l J. Computer Vision, vol. 80, no. 1, pp. 45-57, 2008.
[48] J. Shotton, A. Blake, and R. Cipolla, "Contour-Based Learning for Object Detection," Proc. IEEE Int'l Conf. Computer Vision, vol. 1, pp. 503-510, 2005.
[49] B. Wu and R. Nevatia, "Simultaneous Object Detection and Segmentation by Boosting Local Shape Feature Based Classifier," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 1-8, 2007.
[50] A. Bar-Hillel and D. Weinshall, "Efficient Learning of Relational Object Class Models," Int'l J. Computer Vision, vol. 77, nos. 1-3, pp. 175-198, 2008.
[51] B. Leibe, A. Leonardis, and B. Schiele, "Combined Object Categorization and Segmentation with an Implicit Shape Model," Proc. European Conf. Computer Vision Workshop Statistical Learning in Computer Vision, pp. 17-32, May 2004.
[52] H. Schneiderman, "Feature-Centric Evaluation for Efficient Cascaded Object Detection," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2004.
[53] R. Fergus, P. Perona, and A. Zisserman, "Object Class Recognition by Unsupervised Scale-Invariant Learning," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, vol. 2, p. 264, 2003.
[54] H. Grabner, C. Beleznai, and H. Bischof, "Improving Adaboost Detection Rate by Wobble and Mean Shift," Proc. Computer Vision Winter Workshop, pp. 23-32, 2005.
[55] E. Seemann, B. Leibe, and B. Schiele, "Multi-Aspect Detection of Articulated Objects," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 1582-1588, 2006.
[56] J. Winn and J. Shotton, "The Layout Consistent Random Field for Recognizing and Segmenting Partially Occluded Objects," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 37-44, 2006.

Index Terms:
Boosting, AdaBoost, cost-sensitive learning, asymmetric boosting.
Citation:
Hamed Masnadi-Shirazi, Nuno Vasconcelos, "Cost-Sensitive Boosting," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 2, pp. 294-309, Feb. 2011, doi:10.1109/TPAMI.2010.71
Usage of this product signifies your acceptance of the Terms of Use.