Issue No.01 - Jan. (2013 vol.25)
Shuo Wang , The University of Birmingham, Birmingham
Xin Yao , The University of Birmingham, Birmingham
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2011.207
In class imbalance learning problems, how to better recognize examples from the minority class is the key focus, since it is usually more important and expensive than the majority class. Quite a few ensemble solutions have been proposed in the literature with varying degrees of success. It is generally believed that diversity in an ensemble could help to improve the performance of class imbalance learning. However, no study has actually investigated diversity in depth in terms of its definitions and effects in the context of class imbalance learning. It is unclear whether diversity will have a similar or different impact on the performance of minority and majority classes. In this paper, we aim to gain a deeper understanding of if and when ensemble diversity has a positive impact on the classification of imbalanced data sets. First, we explain when and why diversity measured by Q-statistic can bring improved overall accuracy based on two classification patterns proposed by Kuncheva et al. We define and give insights into good and bad patterns in imbalanced scenarios. Then, the pattern analysis is extended to single-class performance measures, including recall, precision, and F-measure, which are widely used in class imbalance learning. Six different situations of diversity's impact on these measures are obtained through theoretical analysis. Finally, to further understand how diversity affects the single class performance and overall performance in class imbalance problems, we carry out extensive experimental studies on both artificial data sets and real-world benchmarks with highly skewed class distributions. We find strong correlations between diversity and discussed performance measures. Diversity shows a positive impact on the minority class in general. It is also beneficial to the overall performance in terms of AUC and G-mean.
Accuracy, Diversity reception, Correlation, Context, Pattern analysis, Training, Boosting, data mining, Class imbalance learning, ensemble learning, diversity, single-class performance measures
Shuo Wang, Xin Yao, "Relationships between Diversity of Classification Ensembles and Single-Class Performance Measures", IEEE Transactions on Knowledge & Data Engineering, vol.25, no. 1, pp. 206-219, Jan. 2013, doi:10.1109/TKDE.2011.207