loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Combating the Small Sample Class Imbalance Problem Using Feature Selection
PrePrint
ISSN: 1041-4347
Mike Wasikowski, United States Army Traning and Doctrine Command Analysis Center, Fort Leavenworth
Xue-wen Chen, The University of Kansas, Lawrence
Researchers have rigorously studied the resampling, algorithms, and feature selection approaches to the class imbalance problem. No systematic studies have been conducted to understand how well these methods combat the class imbalance problem and which of these methods best manage the different challenges posed by imbalanced data sets. In particular, feature selection has rarely been studied outside of text classification problems. Additionally, no studies have looked at the additional problem of learning from small samples. This paper presents a first systematic comparison of the three types of methods and of seven feature selection metrics evaluated on small sample data sets from different applications. We evaluated the performance of these metrics using area under the receiver operating characteristic and area under the precision-recall curve. We compared each metric on the average performance across all problems and on the likelihood of a metric yielding the best performance on a specific problem. We examined the performance of these metrics inside each problem domain. Finally, we evaluated the efficacy of these metrics to see which perform best across algorithms. Our results showed that signal-to-noise ratio and Feature Assessment by Sliding Thresholds are great candidates for feature selection in most applications, especially when selecting very small numbers of features.
Index Terms:
Data mining, Machine learning
Citation:
Mike Wasikowski, Xue-wen Chen, "Combating the Small Sample Class Imbalance Problem Using Feature Selection," IEEE Transactions on Knowledge and Data Engineering, 25 Sept. 2009. IEEE computer Society Digital Library. IEEE Computer Society, <http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.187>
Usage of this product signifies your acceptance of the Terms of Use.