This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Training Cost-Sensitive Neural Networks with Methods Addressing the Class Imbalance Problem
January 2006 (vol. 18 no. 1)
pp. 63-77
This paper studies empirically the effect of sampling and threshold-moving in training cost-sensitive neural networks. Both oversampling and undersampling are considered. These techniques modify the distribution of the training data such that the costs of the examples are conveyed explicitly by the appearances of the examples. Threshold-moving tries to move the output threshold toward inexpensive classes such that examples with higher costs become harder to be misclassified. Moreover, hard-ensemble and soft-ensemble, i.e., the combination of above techniques via hard or soft voting schemes, are also tested. Twenty-one UCI data sets with three types of cost matrices and a real-world cost-sensitive data set are used in the empirical study. The results suggest that cost-sensitive learning with multiclass tasks is more difficult than with two-class tasks, and a higher degree of class imbalance may increase the difficulty. It also reveals that almost all the techniques are effective on two-class tasks, while most are ineffective and even may cause negative effect on multiclass tasks. Overall, threshold-moving and soft-ensemble are relatively good choices in training cost-sensitive neural networks. The empirical study also suggests that some methods that have been believed to be effective in addressing the class imbalance problem may, in fact, only be effective on learning with imbalanced two-class data sets.

[1] N. Abe, B. Zadrozny, and J. Langford, “An Iterative Method for Multiclass Cost-Sensitive Learning,” Proc. 10th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 3-11, 2004.
[2] E. Bauer and R. Kohavi, “An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants,” Machine Learning, vol. 36, nos. 1-2, pp. 102-139, 1999.
[3] S.D. Bay, “UCI KDD Archive,” Dept. of Information and Computer Science, Univ. of California, Irvine, 2000, http:/kdd.ics.uci.edu/.
[4] C. Blake, E. Keogh, and C.J. Merz, “UCI Repository of Machine Learning Databases,” Dept. of Information and Computer Science, Univ. of California, Irvine, 1998, http://www.ics.uci.edu/~mlearnMLRepository.html .
[5] J.P. Bradford, C. Kuntz, R. Kohavi, C. Brunk, and C.E. Brodley, “Pruning Decision Trees with Misclassification Costs,” Proc. 10th European Conf. Machine Learning, pp. 131-136, 1998.
[6] U. Brefeld, P. Geibel, and F. Wysotzki, “Support Vector Machines with Example Dependent Costs,” Proc. 14th European Conf. Machine Learning, pp. 23-34, 2003.
[7] L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone, Classification and Regression Trees. Belmont, Calif.: Wadsworth, 1984.
[8] N.V. Chawla, K.W. Bowyer, L.O. Hall, and W.P. Kegelmeyer, “SMOTE: Synthetic Minority Over-Sampling Technique,” J. Artificial Intelligence Research, vol. 16, pp. 321-357, 2002.
[9] B.V. Dasarathy, Nearest Neighbor Norms: NN Pattern Classification Techniques. Los Alamitos, Calif.: IEEE CS Press, 1991.
[10] T.G. Dietterich, “Ensemble Learning,” The Handbook of Brain Theory and Neural Networks, second ed., M.A. Arbib, ed., Cambridge, Mass.: MIT Press, 2002.
[11] P. Domingos, “MetaCost: A General Method for Making Classifiers Cost-Sensitive,” Proc. Fifth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 155-164, 1999.
[12] C. Drummond and R.C. Holte, “Explicitly Representing Expected Cost: An Alternative to ROC Representation,” Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 198-207, 2000.
[13] C. Drummond and R.C. Holte, “C4.5, Class Imbalance, and Cost Sensitivity: Why Under-Sampling Beats Over-Sampling,” Working Notes of the ICML'03 Workshop Learning from Imbalanced Data Sets, 2003.
[14] C. Elkan, “The Foundations of Cost-Sensitive Learning,” Proc. 17th Int'l Joint Conf. Artificial Intelligence, pp. 973-978, 2001.
[15] N. Japkowicz, “Learning from Imbalanced Data Sets: A Comparison of Various Strategies,” Working Notes of the AAAI'00 Workshop Learning from Imbalanced Data Sets, pp. 10-15, 2000.
[16] N. Japkowicz and S. Stephen, “The Class Imbalance Problem: A Systematic Study,” Intelligent Data Analysis, vol. 6, no. 5, pp. 429-450, 2002.
[17] U. Knoll, G. Nakhaeizadeh, and B. Tausend, “Cost-Sensitive Pruning of Decision Trees,” Proc. Eighth European Conf. Machine Learning, pp. 383-386, 1994.
[18] M. Kubat and S. Matwin, “Addressing the Curse of Imbalanced Training Sets: One-Sided Selection,” Proc. 14th Int'l Conf. Machine Learning, pp. 179-186, 1997.
[19] M. Kukar and I. Kononenko, “Cost-Sensitive Learning with Neural Networks,” Proc. 13th European Conf. Artificial Intelligence, pp. 445-449, 1998.
[20] L.I. Kuncheva and C.J. Whitaker, “Measures of Diversity in Classifier Ensembles,” Machine Learning, vol. 51, no. 2, pp. 181-207, 2003.
[21] S. Lawrence, I. Burns, A. Back, A.C. Tsoi, and C.L. Giles, “Neural Network Classification and Prior Class Probabilities,” Lecture Notes in Computer Science 1524, G.B. Orr and K.-R. Müller, eds., pp. 299-313, Berlin: Springer, 1998.
[22] M.A. Maloof, “Learning When Data Sets are Imbalanced and When Costs Are Unequal and Unknown,” Proc. Working Notes ICML'03 Workshop Learning from Imbalanced Data Sets, 2003.
[23] D.D. Margineantu and T.G. Dietterich, “Bootstrap Methods for the Cost-Sensitive Evaluation of Classifiers,” Proc. 17th Int'l Conf. Machine Learning, pp. 583-590, 2000.
[24] M. Pazzani, C. Merz, P. Murphy, K. Ali, T. Hume, and C. Brunk, “Reducing Misclassification Costs,” Proc. 11th Int'l Conf. Machine Learning, pp. 217-225, 1994.
[25] F. Provost, “Machine Learning from Imbalanced Data Sets 101,” Working Notes AAAI'00 Workshop Learning from Imbalanced Data Sets, pp. 1-3, 2000.
[26] F. Provost and T. Fawcett, “Analysis and Visualization of Classifier Performance: Comparison Under Imprecise Class and Cost Distributions,” Proc. Third ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 43-48, 1997.
[27] J.R. Quinlan, “MiniBoosting Decision Trees,” http://www.cse. unsw.edu.au/~quinlanminiboost.ps , 1998.
[28] D.E. Rumelhart, G.E. Hinton, and R.J. Williams, “Learning Internal Representations by Error Propagation,” Parallel Distributed Processing: Explorations in The Microstructure of Cognition, D.E. Rumelhart and J.L. McClelland, eds., vol. 1, pp. 318-362, Cambridge, Mass.: MIT Press, 1986.
[29] Machine Learning— A Technological Roadmap, L. Saitta, ed. The Netherlands: Univ. of Amsterdam, 2000.
[30] C. Stanfill and D. Waltz, “Toward Memory-Based Reasoning,” Comm. ACM, vol. 29, no. 12, pp. 1213-1228, 1986.
[31] K.M. Ting, “A Comparative Study of Cost-Sensitive Boosting Algorithms,” Proc. 17th Int'l Conf. Machine Learning, pp. 983-990, 2000.
[32] K.M. Ting, “An Empirical Study of MetaCost Using Boosting Algorithm,” Proc. 11th European Conf. Machine Learning, pp. 413-425, 2000.
[33] K.M. Ting, “An Instance-Weighting Method to Induce Cost-Sensitive Trees,” IEEE Trans. Knowledge and Data Eng., vol. 14, no. 3, pp. 659-665, Apr./May 2002.
[34] I. Tomek, “Two Modifications of CNN,” IEEE Trans. Systems, Man, and Cybernetics, vol. 6, no. 6, pp. 769-772, 1976.
[35] P.D. Turney, “Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm,” J. Artificial Intelligence Research, vol. 2, pp. 369-409, 1995.
[36] M. Vlachos, C. Domeniconi, D. Gunopulos, G. Kollios, and N. Koudas, “Non-Linear Dimensionality Reduction Techniques for Classification and Visualization,” Proc. Eighth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 645-651, 2002.
[37] G.I. Webb, “Cost-Sensitive Specialization,” Proc. Fourth Pacific Rim Int'l Conf. Artificial Intelligence, pp. 23-34, 1996.
[38] G.M. Weiss, “Mining with Rarity— Problems and Solutions: A Unifying Framework,” SIGKDD Explorations, vol. 6, no. 1, pp. 7-19, 2004.
[39] B. Zadrozny and C. Elkan, “Learning and Making Decisions When Costs and Probabilities Are Both Unknown,” Proc. Seventh ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 204-213, 2001.

Index Terms:
Index Terms- Machine learning, data mining, neural networks, cost-sensitive learning, class imbalance learning, sampling, threshold-moving, ensemble learning.
Citation:
Zhi-Hua Zhou, Xu-Ying Liu, "Training Cost-Sensitive Neural Networks with Methods Addressing the Class Imbalance Problem," IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 1, pp. 63-77, Jan. 2006, doi:10.1109/TKDE.2006.17
Usage of this product signifies your acceptance of the Terms of Use.