Subscribe

Issue No.05 - May (2009 vol.21)

pp: 638-651

Claudia Diamantini , Università Politecnica delle Marche, Ancona

Domenico Potena , Università Politecnica delle Marche, Ancona

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2008.187

ABSTRACT

The class-imbalance problem is the problem of learning a classification rule from data that are skewed in favor of one class. On these datasets traditional learning techniques tend to overlook the less numerous class, at the advantage of the majority class. However, the minority class is often the most interesting one for the task at hand. For this reason, the class-imbalance problem has received increasing attention in the last few years. In the present paper we point the attention of the reader to a learning algorithm for the minimization of the average misclassification risk. In contrast to some popular class-imbalance learning methods, this method has its roots in statistical decision theory. A particular interesting characteristic is that when class distributions are unknown, the method can work by resorting to stochastic gradient algorithm. We study the behavior of this algorithm on imbalanced datasets, demonstrating that this principled approach allows to obtain better classification performances compared to the principal methods proposed in the literature.

INDEX TERMS

Clustering, classification, and association rules, Data mining, Mining methods and algorithms, Machine learning, Classifier design and evaluation

CITATION

Claudia Diamantini, Domenico Potena, "Bayes Vector Quantizer for Class-Imbalance Problem",

*IEEE Transactions on Knowledge & Data Engineering*, vol.21, no. 5, pp. 638-651, May 2009, doi:10.1109/TKDE.2008.187REFERENCES

- [1] L. Saitta,
Machine Learning—A Technological Roadmap. Univ. of Amsterdam, 2000.- [3] C. Drummond and R. Holte, “C4.5, Class Imbalance, and Cost Sensitivity: Why Under-Sampling Beats Over-Sampling,”
Working Notes of the ICML Workshop Learning from Imbalanced Data Sets, 2003.- [4] M. Maloof, “Learning When Data Sets Are Imbalanced and When Costs Are Unequal and Unknown,”
Working Notes of the ICML Workshop Learning from Imbalanced Data Sets, 2003.- [5] G. Weiss, “Mining with Rarity: A Unifying Framework,”
ACM SIGKDD Explorations, vol. 6, no. 1, pp. 7-19, June 2004.- [8] K. Fukunaga,
Introduction to Statistical Pattern Recognition, second ed. Academic Press, 1990.- [10] J.H. Friedman, “On Bias, Variance, 0/1-Loss, and the Curse-of-Dimensionality,”
Data Mining and Knowledge Discovery, vol. 1, no. 1, pp. 55-77, 1997.- [11] J.R. Quinlan,
C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.- [12] V. Vapnik,
Statistical Learning Theory. John Wiley & Sons, 1998.- [13] N. Chawla, K. Bowyer, L. Hall, and W. Kegelmeyer, “SMOTE: Synthetic Minority Over-Sampling Technique,”
J. Artificial Intelligence Research, vol. 16, pp. 321-357, 2002.- [14] M. Kubat, R. Holte, and S. Matwin, “Learning When Negative Examples Abound,”
Proc. Ninth European Conf. Machine Learning (ECML '97), pp. 146-153, 1997.- [15] C. Elkan, “The Foundations of Cost-Sensitive Learning,”
Proc. 17th Int'l Joint Conf. Artificial Intelligence (IJCAI '01), pp. 973-978, 2001.- [16] X.-Y. Liu, J. Wu, and Z.-H. Zhou, “Exploratory Under-Sampling for Class-Imbalance Learning,”
Proc. Sixth IEEE Int'l Conf. Data Mining (ICDM '06), pp. 1-5, 2006.- [17] P. Domingos, “MetaCost: A General Method for Making Classifiers Cost-Sensitive,”
Proc. Fifth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '99), pp. 155-164, 1999.- [18] M. Kukar and I. Kononenko, “Cost-Sensitive Learning with Neural Networks,”
Proc. 10th European Conf. Machine Learning (ECML '98), pp. 268-279, 1998.- [19] V. Sheng and C. Ling, “Thresholding for Making Classifiers Cost-Sensitive,”
Proc. 21st Nat'l Conf. Artificial Intelligence (AAAI '06), pp. 476-481, 2006.- [20] G. Karakoulas and J. Shawe-Taylor, “Optimizing Classifiers for Imbalanced Training Sets,”
Proc. Conf. Advances in Neural Information Processing Systems (NIPS '99), pp. 253-259, 1999.- [21] F. Provost, “Machine Learning from Imbalanced Data Sets 101,”
Working Notes AAAI Workshop Learning from Imbalanced Data Sets, pp. 1-3, 2000.- [22] T.G. Dietterich, “Ensemble Learning,”
The Handbook of Brain Theory and Neural Networks, second ed., M.A. Arbib, ed., MITPress, 2002.- [23] D. Margineantu, “Class Probability Estimation and Cost-Sensitive Classification Decisions,”
Proc. 13th European Conf. Machine Learning (ECML '02), pp. 270-281, 2002.- [24] K.M. Ting, “A Comparative Study of Cost-Sensitive Boosting Algorithms,”
Proc. 17th Int'l Conf. Machine Learning (ICML '00), pp.983-990, 2000.- [26] K.M. Ting, “An Instance-Weighting Method to Induce Cost-Sensitive Trees,”
IEEE Trans. Knowledge and Data Eng., vol. 14, no. 3, pp. 659-665, May/June 2002.- [27] K. Veropoulos, C. Campbell, and N. Cristianini, “Controlling the Sensitivity of Support Vector Machines,”
Proc. 16th Int'l Joint Conf. Artificial Intelligence (IJCAI '99), pp. 55-60, 1999.- [28] R. Akbani, S. Kwek, and N. Japkowicz, “Applying Support Vector Machines to Imbalanced Datasets,”
Proc. 15th European Conf. Machine Learning (ECML '04), pp. 39-50, 2004.- [29] A. Gersho and R.M. Gray,
Vector Quantization and Signal Compression. Kluwer Academic Publishers, 1992.- [30] J.A. Hartigan,
Clustering Algorithms. John Wiley & Sons, 1975.- [32] T.M. Cover and P.E. Hart, “Nearest Neighbor Pattern Classification,”
IEEE Trans. Information Theory, vol. 13, no. 1, pp. 21-27, Jan. 1967.- [35] S. Haykin,
Neural Networks: A Comprehensive Foundation, second ed. Prentice Hall, 1999.- [36] C. Diamantini and M. Panti, “An Efficient and Scalable Data Compression Approach to Classification,”
ACM SIGKDD Explorations, vol. 2, no. 2, pp. 54-60, Dec. 2000.- [37] A. Asuncion and D. Newman,
UCI Machine Learning Repository. School of Information and Computer Sciences, Univ. of California, Irvine, http://archive.ics.uci.eduml/, 2007.- [38] I.H. Witten and E. Frank,
Data Mining: Practical Machine Learning Tools and Techniques with Java. Morgan Kaufmann, 2005.- [40] P. Melville, S.M. Yang, M. Saar-Tsechansky, and R. Mooney, “Active Learning for Probability Estimation Using Jensen-Shannon Divergence,”
Proc. 16th European Conf. Machine Learning (ECML '05), pp. 268-279, 2005.- [41] G.I. Webb, “Multiboosting: A Technique for Combining Boosting and Wagging,”
Machine Learning, vol. 40, no. 2, pp. 159-196, 2000. |