The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.05 - May (2009 vol.21)
pp: 638-651
Claudia Diamantini , Università Politecnica delle Marche, Ancona
ABSTRACT
The class-imbalance problem is the problem of learning a classification rule from data that are skewed in favor of one class. On these datasets traditional learning techniques tend to overlook the less numerous class, at the advantage of the majority class. However, the minority class is often the most interesting one for the task at hand. For this reason, the class-imbalance problem has received increasing attention in the last few years. In the present paper we point the attention of the reader to a learning algorithm for the minimization of the average misclassification risk. In contrast to some popular class-imbalance learning methods, this method has its roots in statistical decision theory. A particular interesting characteristic is that when class distributions are unknown, the method can work by resorting to stochastic gradient algorithm. We study the behavior of this algorithm on imbalanced datasets, demonstrating that this principled approach allows to obtain better classification performances compared to the principal methods proposed in the literature.
INDEX TERMS
Clustering, classification, and association rules, Data mining, Mining methods and algorithms, Machine learning, Classifier design and evaluation
CITATION
Claudia Diamantini, "Bayes Vector Quantizer for Class-Imbalance Problem", IEEE Transactions on Knowledge & Data Engineering, vol.21, no. 5, pp. 638-651, May 2009, doi:10.1109/TKDE.2008.187
REFERENCES
[1] L. Saitta, Machine Learning—A Technological Roadmap. Univ. of Amsterdam, 2000.
[2] Z.-H. Zhou and X.-Y. Liu, “Training Cost-Sensitive Neural Networks with Methods Addressing the Class Imbalance Problem,” IEEE Trans. Knowledge and Data Eng., vol. 18, no. 1, pp. 63-77, Jan. 2006.
[3] C. Drummond and R. Holte, “C4.5, Class Imbalance, and Cost Sensitivity: Why Under-Sampling Beats Over-Sampling,” Working Notes of the ICML Workshop Learning from Imbalanced Data Sets, 2003.
[4] M. Maloof, “Learning When Data Sets Are Imbalanced and When Costs Are Unequal and Unknown,” Working Notes of the ICML Workshop Learning from Imbalanced Data Sets, 2003.
[5] G. Weiss, “Mining with Rarity: A Unifying Framework,” ACM SIGKDD Explorations, vol. 6, no. 1, pp. 7-19, June 2004.
[6] C. Diamantini and A. Spalvieri, “Quantizing for Minimum Average Misclassification Risk,” IEEE Trans. Neural Networks, vol. 9, no. 1, pp. 174-182, Jan. 1998.
[7] X.-Y. Liu and Z.-H. Zhou, “The Influence of Class Imbalance on Cost-Sensitive Learning: An Empirical Study,” Proc. Sixth IEEE Int'l Conf. Data Mining (ICDM '06), pp. 970-974, 2006.
[8] K. Fukunaga, Introduction to Statistical Pattern Recognition, second ed. Academic Press, 1990.
[9] M.D. Richard and R.P. Lippmann, “Neural Network Classifiers Estimate Bayesian A Posteriori Probability,” Neural Computation, vol. 3, no. 4, pp. 461-483, 1991.
[10] J.H. Friedman, “On Bias, Variance, 0/1-Loss, and the Curse-of-Dimensionality,” Data Mining and Knowledge Discovery, vol. 1, no. 1, pp. 55-77, 1997.
[11] J.R. Quinlan, C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
[12] V. Vapnik, Statistical Learning Theory. John Wiley & Sons, 1998.
[13] N. Chawla, K. Bowyer, L. Hall, and W. Kegelmeyer, “SMOTE: Synthetic Minority Over-Sampling Technique,” J. Artificial Intelligence Research, vol. 16, pp. 321-357, 2002.
[14] M. Kubat, R. Holte, and S. Matwin, “Learning When Negative Examples Abound,” Proc. Ninth European Conf. Machine Learning (ECML '97), pp. 146-153, 1997.
[15] C. Elkan, “The Foundations of Cost-Sensitive Learning,” Proc. 17th Int'l Joint Conf. Artificial Intelligence (IJCAI '01), pp. 973-978, 2001.
[16] X.-Y. Liu, J. Wu, and Z.-H. Zhou, “Exploratory Under-Sampling for Class-Imbalance Learning,” Proc. Sixth IEEE Int'l Conf. Data Mining (ICDM '06), pp. 1-5, 2006.
[17] P. Domingos, “MetaCost: A General Method for Making Classifiers Cost-Sensitive,” Proc. Fifth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '99), pp. 155-164, 1999.
[18] M. Kukar and I. Kononenko, “Cost-Sensitive Learning with Neural Networks,” Proc. 10th European Conf. Machine Learning (ECML '98), pp. 268-279, 1998.
[19] V. Sheng and C. Ling, “Thresholding for Making Classifiers Cost-Sensitive,” Proc. 21st Nat'l Conf. Artificial Intelligence (AAAI '06), pp. 476-481, 2006.
[20] G. Karakoulas and J. Shawe-Taylor, “Optimizing Classifiers for Imbalanced Training Sets,” Proc. Conf. Advances in Neural Information Processing Systems (NIPS '99), pp. 253-259, 1999.
[21] F. Provost, “Machine Learning from Imbalanced Data Sets 101,” Working Notes AAAI Workshop Learning from Imbalanced Data Sets, pp. 1-3, 2000.
[22] T.G. Dietterich, “Ensemble Learning,” The Handbook of Brain Theory and Neural Networks, second ed., M.A. Arbib, ed., MITPress, 2002.
[23] D. Margineantu, “Class Probability Estimation and Cost-Sensitive Classification Decisions,” Proc. 13th European Conf. Machine Learning (ECML '02), pp. 270-281, 2002.
[24] K.M. Ting, “A Comparative Study of Cost-Sensitive Boosting Algorithms,” Proc. 17th Int'l Conf. Machine Learning (ICML '00), pp.983-990, 2000.
[25] G. Wu and E.Y. Chang, “KBA: Kernel Boundary Alignment Considering Imbalanced Data Distribution,” IEEE Trans. Knowledge and Data Eng., vol. 17, no. 6, pp. 786-795, June 2005.
[26] K.M. Ting, “An Instance-Weighting Method to Induce Cost-Sensitive Trees,” IEEE Trans. Knowledge and Data Eng., vol. 14, no. 3, pp. 659-665, May/June 2002.
[27] K. Veropoulos, C. Campbell, and N. Cristianini, “Controlling the Sensitivity of Support Vector Machines,” Proc. 16th Int'l Joint Conf. Artificial Intelligence (IJCAI '99), pp. 55-60, 1999.
[28] R. Akbani, S. Kwek, and N. Japkowicz, “Applying Support Vector Machines to Imbalanced Datasets,” Proc. 15th European Conf. Machine Learning (ECML '04), pp. 39-50, 2004.
[29] A. Gersho and R.M. Gray, Vector Quantization and Signal Compression. Kluwer Academic Publishers, 1992.
[30] J.A. Hartigan, Clustering Algorithms. John Wiley & Sons, 1975.
[31] T. Kohonen, “The Self Organizing Map,” Proc. IEEE, vol. 78, no. 9, pp. 1464-1480, Sept. 1990.
[32] T.M. Cover and P.E. Hart, “Nearest Neighbor Pattern Classification,” IEEE Trans. Information Theory, vol. 13, no. 1, pp. 21-27, Jan. 1967.
[33] P.E. Hart, “The Condensed Nearest Neighbor Rule,” IEEE Trans. Information Theory, vol. 14, no. 3, pp. 515-516, May 1968.
[34] T. Kohonen, G. Barna, and R. Chrisley, “Statistical Pattern Recognition with Neural Networks: Benchmarking Studies,” Proc. IEEE Int'l Conf. Neural Networks, vol. 1, pp. 61-68, May 1988.
[35] S. Haykin, Neural Networks: A Comprehensive Foundation, second ed. Prentice Hall, 1999.
[36] C. Diamantini and M. Panti, “An Efficient and Scalable Data Compression Approach to Classification,” ACM SIGKDD Explorations, vol. 2, no. 2, pp. 54-60, Dec. 2000.
[37] A. Asuncion and D. Newman, UCI Machine Learning Repository. School of Information and Computer Sciences, Univ. of California, Irvine, http://archive.ics.uci.eduml/, 2007.
[38] I.H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java. Morgan Kaufmann, 2005.
[39] M. Vlachos, C. Domeniconi, D. Gunopulos, G. Kollios, and N. Koudas, “Non-Linear Dimensionality Reduction Techniques for Classification and Visualization,” Proc. Eighth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '02), pp. 645-651, 2002.
[40] P. Melville, S.M. Yang, M. Saar-Tsechansky, and R. Mooney, “Active Learning for Probability Estimation Using Jensen-Shannon Divergence,” Proc. 16th European Conf. Machine Learning (ECML '05), pp. 268-279, 2005.
[41] G.I. Webb, “Multiboosting: A Technique for Combining Boosting and Wagging,” Machine Learning, vol. 40, no. 2, pp. 159-196, 2000.
22 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool