Issue No. 02 - March/April (2011 vol. 8)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2010.96
Sangyoon Oh , Ajou University, Suwon
Min Su Lee , Seoul National University, Seoul
Byoung-Tak Zhang , Seoul National University, Seoul
In biomedical data, the imbalanced data problem occurs frequently and causes poor prediction performance for minority classes. It is because the trained classifiers are mostly derived from the majority class. In this paper, we describe an ensemble learning method combined with active example selection to resolve the imbalanced data problem. Our method consists of three key components: 1) an active example selection algorithm to choose informative examples for training the classifier, 2) an ensemble learning method to combine variations of classifiers derived by active example selection, and 3) an incremental learning scheme to speed up the iterative training procedure for active example selection. We evaluate the method on six real-world imbalanced data sets in biomedical domains, showing that the proposed method outperforms both the random under sampling and the ensemble with under sampling methods. Compared to other approaches to solving the imbalanced data problem, our method excels by 0.03-0.15 points in AUC measure.
Bioinformatics, classification, interactive data exploration and discovery, mining methods and algorithms.
S. Oh, M. Su Lee and B. Zhang, "Ensemble Learning with Active Example Selection for Imbalanced Biomedical Data Classification," in IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 8, no. , pp. 316-325, 2010.