Feature Selection Metric Using AUC Margin for Small Samples and Imbalanced Data Classification Problems
Honolulu, Hawaii USA
Dec. 18, 2011 to Dec. 21, 2011
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICMLA.2011.70
Feature selection helps us to address problems possessing high dimensionality, retaining only those features that are most important for the classification task. However, traditional feature selection methods fail to account for imbalanced class distributions, leading to poor predictions for minority class samples. Recently, there has been a growing interest around the Area Under ROC curve (AUC) metric due to the fact that it can provide meaningful performance measures in the presence of imbalanced data. In this paper, we propose a new margin-based feature selection metric that defines the quality of a set of features by considering the maximized AUC margin it induces during the process of learning with boosting. Our algorithm measures the cumulative effect each feature has on the margin distribution associated with the weighted linear combination that boosting produces over the positive and the negative examples. Experiments on various real imbalanced data sets show the effectiveness of our algorithm when faced with selecting informative features from small data possessing skewed class distributions.
boosting, area under the ROC curve (AUC), feature election, margin
Javed A. Aslam, Jennifer Dy, David Kaeli, "Feature Selection Metric Using AUC Margin for Small Samples and Imbalanced Data Classification Problems", ICMLA, 2011, Machine Learning and Applications, Fourth International Conference on, Machine Learning and Applications, Fourth International Conference on 2011, pp. 145-150, doi:10.1109/ICMLA.2011.70