The Community for Technology Leaders
Green Image
Issue No. 01 - Jan. (2014 vol. 26)
ISSN: 1041-4347
pp: 131-143
Yubin Park , The University of Texas at Austin, Austin
Joydeep Ghosh , The University of Texas at Austin, Austin
This paper introduces two kinds of decision tree ensembles for imbalanced classification problems, extensively utilizing properties of $(\alpha)$-divergence. First, a novel splitting criterion based on $(\alpha)$-divergence is shown to generalize several well-known splitting criteria such as those used in C4.5 and CART. When the $(\alpha)$-divergence splitting criterion is applied to imbalanced data, one can obtain decision trees that tend to be less correlated ($(\alpha)$-diversification) by varying the value of $(\alpha)$. This increased diversity in an ensemble of such trees improves AUROC values across a range of minority class priors. The second ensemble uses the same alpha trees as base classifiers, but uses a lift-aware stopping criterion during tree growth. The resultant ensemble produces a set of interpretable rules that provide higher lift values for a given coverage, a property that is much desirable in applications such as direct marketing. Experimental results across many class-imbalanced data sets, including BRFSS, and MIMIC data sets from the medical community and several sets from UCI and KEEL are provided to highlight the effectiveness of the proposed ensembles over a wide range of data distributions and of class imbalance.
Decision trees, Impurities, Equations, Measurement, Training, Training data, Entropy

Y. Park and J. Ghosh, "Ensembles of $({\alpha})$-Trees for Imbalanced Classification Problems," in IEEE Transactions on Knowledge & Data Engineering, vol. 26, no. 1, pp. 131-143, 2013.
256 ms
(Ver 3.3 (11022016))