Subscribe
Issue No.01 - Jan. (2014 vol.26)
pp: 131-143
Yubin Park , The University of Texas at Austin, Austin
Joydeep Ghosh , The University of Texas at Austin, Austin
ABSTRACT
This paper introduces two kinds of decision tree ensembles for imbalanced classification problems, extensively utilizing properties of $(\alpha)$-divergence. First, a novel splitting criterion based on $(\alpha)$-divergence is shown to generalize several well-known splitting criteria such as those used in C4.5 and CART. When the $(\alpha)$-divergence splitting criterion is applied to imbalanced data, one can obtain decision trees that tend to be less correlated ($(\alpha)$-diversification) by varying the value of $(\alpha)$. This increased diversity in an ensemble of such trees improves AUROC values across a range of minority class priors. The second ensemble uses the same alpha trees as base classifiers, but uses a lift-aware stopping criterion during tree growth. The resultant ensemble produces a set of interpretable rules that provide higher lift values for a given coverage, a property that is much desirable in applications such as direct marketing. Experimental results across many class-imbalanced data sets, including BRFSS, and MIMIC data sets from the medical community and several sets from UCI and KEEL are provided to highlight the effectiveness of the proposed ensembles over a wide range of data distributions and of class imbalance.
INDEX TERMS
Decision trees, Impurities, Equations, Measurement, Training, Training data, Entropy,ensemble classification, Data mining, decision trees, imbalanced data sets, lift
CITATION
Yubin Park, Joydeep Ghosh, "Ensembles of $({\alpha})$-Trees for Imbalanced Classification Problems", IEEE Transactions on Knowledge & Data Engineering, vol.26, no. 1, pp. 131-143, Jan. 2014, doi:10.1109/TKDE.2012.255
REFERENCES
 [1] H. He and E.A. Garcia, "Learning from Imbalanced Data," IEEE Trans. Knowledge and Data Eng., vol. 21, no. 9, pp. 1263-1284, Sept. 2009. [2] J. Laurikkala, "Improving Identification of Difficult Small Classes by Balancing Class Distribution," Proc. Eighth Conf. AI in Medicine in Europe: Artificial Intelligence Medicine, pp. 63-66, 2001. [3] G. Weiss and F. Provost, "The Effect of Class Distribution on Classifier Learning: An Empirical Study," technical report, Dept. of Computer Science Rutgers, Univ., 2001. [4] K. McCarthy, B. Zarbar, and G. Weiss, "Does Cost-Sensitive Learning Beat Sampling for Classifying Rare Classes?" Proc. Int'l Workshop Utility-Based Data Mining, pp. 69-77, 2005. [5] R. Akbani, S. Kwek, and N. Japkowicz, "Applying Support Vector Machines to Imbalanced Data Sets," Proc. 15th European Conf. Machine Learning, 2004. [6] S. Ertekin, J. Huang, and C.L. Giles, "Learning on the Border: Active Learning in Imbalanced Data Classification," Proc. 30th Ann. Int'l ACM SIGIR Conf., pp. 823-824, 2007. [7] N. Japkowicz and S. Stephen, "The Class Imbalance Problem: A Systematic Study," Intelligent Data Analysis, vol. 6, no. 5, pp. 429-449, 2002. [8] G. Weiss and F. Provost, "Learning when Training Data are Costly: The Effect of Class Distribution on Tree Induction," J. Artificial Intelligence Research, vol. 19, pp. 315-354, 2003. [9] G.E. Batista, R.C. Prati, and M.C. Monard, "A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data," ACM SIGKDD Explorations Newsletter, vol. 6, no. 1, pp. 20-29, 2004. [10] Y. Sun, A.K.C. Wong, and M.S. Kamel, "Classification of Imbalanced Data: A Review," Int'l J. Pattern Recognition, vol. 23, no. 4, pp. 687-719, 2009. [11] N.V. Chawla, N. Japkowicz, and A. Kotcz, "Editorial: Special Issue on Learning from Imbalanced Data Sets," ACM SIGKDD Explorations Newsletter, vol. 6, pp. 1-6, 2004. [12] J.H. Friedman and B.E. Popescu, "Predictive Learning via Rule Ensembles," The Annals of Applied Statistics, vol. 2, no. 3, pp. 916-954, 2008. [13] D.S. Assoc., The New Direct Marketing, third ed. McGraw-Hill, 1999. [14] M. Galar, A. Fernandez, E. Barrenechea, H. Bustince, and F. Herrera, "A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches," IEEE Trans. Systems, Man, and Cybernetics - Part C: Applications and Rev., vol. 42, no. 4, pp. 463-484, July 2012. [15] D.D. Lewis and J. Catlett, "Heterogeneous Uncertainty Sampling for Supervised Learning," Proc. 11th Int'l Conf. Machine Learning, pp. 148-156, 1994. [16] M. Kubat and S. Matwin, "Addressing the Curse of Imbalanced Training Sets: One-Sided Selection," Proc. 14th Int'l Conf. Machine Learning, pp. 179-186, 1997. [17] N. Japkowicz, "The Class Imbalance Problem: Significance and Strategies," Proc. Int'l Conf. Artificial Intelligence, pp. 111-117, 2000. [18] N.V. Chawla and K.W. Bowyer, "SMOTE: Synthetic Minority Over-Sampling Technique," J. Artificial Intelligence Research, vol. 16, pp. 321-357, 2002. [19] J. Zhang and I. Mani, "kNN Approach to Unbalanced Data Distributions: A Case Study Involving Information Extraction," Proc. ICML '03 Workshop Learning from Imbalanced Dataset II, 2003. [20] C. Drummond and R.C. Holte, "C4.5, Class Imabalance, and Cost Sensitivity: Why Under-Sampling Beats over-Sampling," Proc. ICML Workshop Learning from Imbalanced Data Sets II, 2003. [21] Data Set Shift in Machine Learning, J. Quinonero-Candela, M. Sugiyama, A. Schwaighofer and N.D. Lawrence, eds. MIT Press, 2009. [22] L. Breiman, "Technical Note: Some Properties of Splitting Criteria," Machine Learning, vol. 24, pp. 41-47, 1996. [23] T. Dietterich, M. Kearns, and Y. Mansour, "Applying the Weak Learning Framework to Understand and Improve C4.5," Proc. 13th Int'l Conf. Machine Learning, pp. 96-104, 1996. [24] M. Kearns and Y. Mansour, "On the Boosting Ability of Top-Down Decision Tree Learning Algorithms," Proc. 28th Ann. ACM Symp. Theory of Computing (STOC '96), pp. 459-468, 1996. [25] C. Drummond and R.C. Holte, "Exploiting the Cost (In)Sensitivity of Decision Tree Splitting Criteria," Proc. 17th Int'l Conf. Machine Learning (ICML '00), pp. 239-246, 2000. [26] P.A. Flach, "The Geometry of ROC Space: Understanding Mahine Learning Metrics through ROC Isometrics," Proc. 20th Int'l Conf. Machine Learning (ICML '03), pp. 194-201, 2003. [27] D. Karakos, J. Eisner, S. Khudanpur, and C.E. Priebe, "Cross-Instance Tuning of Unsupervised Document Clustering Algorithms," Proc. North Am. Chapter of the Assoc. Computational Linguistics in Human Language Technologies (NAACL HLT), pp. 252-259, 2007. [28] M.J. Pazzani, C.J. Merz, P.M. Murphy, K. Ali, T. Hume, and C. Brunk, "Reducing Misclassification Costs," Proc. Int'l Conf. Machine Learning (ICML), pp. 217-225, 1994. [29] P. Domingos, "MetaCost: A General Method for Making Classifiers Cost-Sensitive," Proc. Fifth Int'l Conf. Knowledge Discovery and Data Mining, pp. 155-164, 1999. [30] L. Breiman, J. Friedman, R.A. Olshen, and C.J. Stone, Classification and Regression Trees. Chapman & Hall/CRC Press, 1984. [31] M.A. Maloof, "Learning When Data Sets Are Imbalanced and When Costs Are Unequal and Unknown," Proc. ICML '03 Workshop Learning from Imbalanced Data Set II, 2003. [32] N.V. Chawla, "Many Are Better Than One: Improving Probabilistic Estimates from Decision Trees," Proc. First Int'l Conf. Machine Learning Challenges: Evaluating Predictive Uncertainty Visual Object Classification, and Recognizing Textual Entailment, pp. 41-55, 2006. [33] R.E. Banfield, L.O. Hall, K.W. Bowyer, and W.P. Kegelmeyer, "A Comparison of Decision Tree Ensemble Creation Techniques," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 1, pp. 173-180, Jan. 2007. [34] C. Chen, A. Liaw, and L. Breiman, "Using Random Forest to Learn Imbalanced Data," technical report, Dept. of Statistics, U.C. Berkeley, 2004. [35] L. Rokach and O.Z. Maimon, Data Mining with Decision Trees: Theory and Applications. World Scientific Publishing, 2008. [36] J.R. Quinlan, C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993. [37] R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification. John Wiley & Sons, 2001. [38] H. Zhu and R. Rohwer, "Information Geometric Measurements of Generalization," Technical Report 4350, Aston Univ., 1995. [39] K. Tumer and J. Ghosh, "Analysis of Decision Boundaries in Linearly Combined Neural Classifiers," Pattern Recognition, vol. 29, no. 2, pp. 341-348, 1996. [40] L.I. Kuncheva, Combining Pattern Classifiers: Methods and Algorithms. John Wiley & Sons, 2004. [41] J. Burez and D.V. den Poel, "Handling Class Imbalance in Customer Churn Prediction," Expert Systems with Applications, vol. 36, no. 3, pp. 4626-4636, 2008. [42] C. Ling, C.X. Ling, and C. Li, "Data Mining for Direct Marketing: Problems and Solutions," Proc. ACM SIGKDD Fourth Int'l Conf. Knowledge Discovery and Data Mining (SIGKDD), pp. 73-79, 1998. [43] K. Napierala, J. Stefanowski, and S. Wilk, "Learning from Imbalanced Data in Presence of Noisy and Borderline Examples," Proc. Seventh Int'l Conf. Rough Sets and Current Trends in Computing, pp. 158-167, 2010. [44] R. Barandela, J.S. Sanchez, and R.M. Valdovinos, "New Applications of Ensembles of Classifiers," Pattern Analysis and Applications, vol. 6, no. 3, pp. 245-256, 2003. [45] F. Wilcoxon, "Individual Comparisons by Ranking Methods," Biometrics Bull., vol. 1, no. 6, pp. 80-83, 1945. [46] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, second ed. Springer, 2009.