The Community for Technology Leaders
RSS Icon
Issue No.10 - October (2010 vol.22)
pp: 1388-1400
Mike Wasikowski , United States Army Traning and Doctrine Command Analysis Center, Fort Leavenworth
The class imbalance problem is encountered in real-world applications of machine learning and results in a classifier's suboptimal performance. Researchers have rigorously studied the resampling, algorithms, and feature selection approaches to this problem. No systematic studies have been conducted to understand how well these methods combat the class imbalance problem and which of these methods best manage the different challenges posed by imbalanced data sets. In particular, feature selection has rarely been studied outside of text classification problems. Additionally, no studies have looked at the additional problem of learning from small samples. This paper presents a first systematic comparison of the three types of methods developed for imbalanced data classification problems and of seven feature selection metrics evaluated on small sample data sets from different applications. We evaluated the performance of these metrics using area under the receiver operating characteristic (AUC) and area under the precision-recall curve (PRC). We compared each metric on the average performance across all problems and on the likelihood of a metric yielding the best performance on a specific problem. We examined the performance of these metrics inside each problem domain. Finally, we evaluated the efficacy of these metrics to see which perform best across algorithms. Our results showed that signal-to-noise correlation coefficient (S2N) and Feature Assessment by Sliding Thresholds (FAST) are great candidates for feature selection in most applications, especially when selecting very small numbers of features.
Class imbalance problem, feature evaluation and selection, machine learning, pattern recognition, bioinformatics, text mining.
Mike Wasikowski, "Combating the Small Sample Class Imbalance Problem Using Feature Selection", IEEE Transactions on Knowledge & Data Engineering, vol.22, no. 10, pp. 1388-1400, October 2010, doi:10.1109/TKDE.2009.187
[1] C. Elkan, "The Foundations of Cost-Sensitive Learning," Proc. 17th Int'l Joint Conf. Artificial Intelligence, pp. 973-978, 2001.
[2] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, "Gene Selection for Cancer Classification Using Support Vector Machines," Machine Learning, vol. 46, nos. 1-3, pp. 389-422, 2002.
[3] D. Mladenić and M. Grobelnik, "Feature Selection for Unbalanced Class Distribution and Naive Bayes," Proc. 16th Int'l Conf. Machine Learning, pp. 258-267, 1999.
[4] G. Forman, "An Extensive Empirical Study of Feature Selection Metrics for Text Classification," J. Machine Learning Research, vol. 3, pp. 1289-1305, 2003.
[5] Z. Zheng, X. Wu, and R. Srihari, "Feature Selection for Text Categorization on Imbalanced Data," ACM SIGKDD Explorations Newsletter, vol. 6, pp. 80-89, 2004.
[6] D. Casasent and X. Chen, "Feature Reduction and Morphological Processing for Hyperspectral Image Data," Applied Optics, vol. 43, no. 2, pp. 1-10, 2004.
[7] Proc. AAAI '00 Workshop Learning from Imbalanced Data Sets, N. Japkowicz, ed., 2000.
[8] G. Forman and I. Cohen, "Learning from Little: Comparison of Classifiers Given Little Training," Proc. Eighth European Conf. Principles and Practice of Knowledge Discovery in Databases, pp. 161-172, 2004.
[9] Y.j. Cui, S. Davis, C.k. Cheng, and X. Bai, "A Study of Sample Size with Neural Network," Proc. Third Int'l Conf. Machine Learning and Cybernetics, pp. 3444-3448, 2004.
[10] Proc. ICML '03 Workshop Learning from Imbalanced Data Sets., N. Chawla, N. Japkowicz, and A. Kolcz, eds., 2003.
[11] G. Weiss, "Mining with Rarity: A Unifying Framework," ACM SIGKDD Explorations Newsletter, vol. 6, no. 1, pp. 7-19, 2004.
[12] G. Weiss and F. Provost, "Learning when Training Data Are Costly: The Effect of Class Distribution on Tree Induction," J. Artificial Intelligence Research, vol. 19, pp. 315-354, 2003.
[13] N. Chawla, N. Japkowicz, and A. Kotcz, "Editorial: Special Issue on Learning from Imbalanced Data Sets," ACM SIGKDD Explorations Newsletter, vol. 6, no. 1, pp. 1-6, 2004.
[14] M. Kubat and S. Matwin, "Addressing the Curse of Imbalanced Data Sets: One Sided Sampling," Proc. 14th Int'l Conf. Machine Learning, pp. 179-186, 1997.
[15] X. Chen, B. Gerlach, and D. Casasent, "Pruning Support Vectors for Imbalanced Data Classification," Proc. Int'l Joint Conf. Neural Networks, pp. 1883-1888, 2005.
[16] M. Kubat and S. Matwin, "Learning When Negative Examples Abound," Proc. Ninth European Conf. Machine Learning (ECML '97), pp. 146-153, 1997.
[17] N. Chawla, K. Bowyer, L. Hall, and P. Kegelmeyer, "SMOTE: Synthetic Minority Over-Sampling Technique," J. Artificial Intelligence Research, vol. 16, pp. 321-357, 2002.
[18] A. Estabrooks, T. Jo, and N. Japkowicz, "A Multiple Resampling Method for Learning from Imbalanced Data Sets," Computational Intelligence, vol. 20, no. 1, pp. 18-36, 2004.
[19] C. Drummond and R.C. Holte, "Exploiting the Cost (In)sensitivity of Decision Tree Splitting Criteria," Proc. 17th Int'l Conf. Machine Learning, pp. 239-246, 2000.
[20] N. Japkowicz, "Supervised versus Unsupervised Binary Learning by Feedforward Neural Networks," Machine Learning, vol. 42, nos. 1/2, pp. 97-122, 2001.
[21] A. Raskutti and A. Kowalczyk, "Extreme Rebalancing for SVMs: A SVM Study," ACM SIGKDD Explorations Newsletter, vol. 6, no. 1, pp. 60-69, 2004.
[22] L.M. Manevitz and M. Yousef, "One-Class SVMs for Document Classification," J. Machine Learning Research, vol. 2, pp. 139-154, 2001.
[23] C. Elkan and K. Noto, "Learning Classifiers from Only Positive and Unlabeled Data," Proc. ACM SIGKDD '08, pp. 213-220, 2008.
[24] E. Alpaydin, Introduction to Machine Learning, pp. 43-45, 360-363. MIT Press, 2004.
[25] N. Chawla, A. Lazarevic, L. Hall, and K. Bowyer, "SMOTEBoost: Improving Prediction of the Minority Class in Boosting," Proc. Seventh European Conf. Principles and Practice of Knowledge Discovery in Databases (PKDD), pp. 107-119, 2003.
[26] Y. Sun, M. Kamel, and Y. Wang, "Boosting for Learning Multiple Classes with Imbalanced Class Distribution," Proc. Sixth Int'l Conf. Data Mining, pp. 592-602, 2006.
[27] P. Domingos, "MetaCost: A General Method for Making Classifiers Cost-Sensitive," Proc. ACM SIGKDD '99, pp. 155-164, 1999.
[28] T. Fawcett and F. Provost, "Adaptive Fraud Detection," Data Mining and Knowledge Discovery, vol. 1, no. 3, pp. 291-316, 1997.
[29] N. Abe, B. Zadrozny, and J. Langford, "An Iterative Method for Multi-Class Cost-Sensitive Learning," Proc. ACM SIGKDD '04, pp. 3-11, 2004.
[30] H. Masnadi-Shirazi and N. Vasconcelos, "Asymmetric Boosting," Proc. 24th Int'l Conf. Machine Learning, pp. 609-619, 2007.
[31] K. Huang, H. Yang, I. King, and M. Lyu, "Learning Classifiers from Imbalanced Data Based on Biased Minimax Probability Machine," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, vol. 2, no. 27, pp. II-558-II-563, 2004.
[32] K. Ting, "The Problem of Small Disjuncts: Its Remedy on Decision Trees," Proc. 10th Canadian Conf. Artificial Intelligence, pp. 91-97, 1994.
[33] U. Brefeld and T. Scheffer, "AUC Maximizing Support Vector Learning," Proc. Int'l Conf. Machine Learning (ICML) Workshop ROC Analysis in Machine Learning, 2005.
[34] T. Joachims, "Training Linear SVMs in Linear Time," Proc. ACM SIGKDD '06, pp. 217-226, 2006.
[35] H. Xiong and X. Chen, "Kernel-Based Distance Metric Learning for Microarray Data Classification," BMC Bioinformatics, vol. 7, no. 299, pp. 1-11, 2006.
[36] P.V. der Putten and M. van Someren, "A Bias-Variance Analysis of a Real World Learning Problem: The CoIL Challenge 2000," Machine Learning, vol. 57, nos. 1/2, pp. 177-195, 2004.
[37] X. Chen and M. Wasikowski, "FAST: A ROC-Based Feature Selection Metric for Small Samples and Imbalanced Data Classification Problems," Proc. ACM SIGKDD '08, pp. 124-133, 2008.
[38] C. Elkan, "Magical Thinking in Data Mining: Lessons from CoIL Challenge 2000," Proc. ACM SIGKDD '01, pp. 426-431, 2001.
[39] I. Guyon and A. Elisseeff, "An Introduction to Variable and Feature Selection," J. Machine Learning Research, vol. 3, pp. 1157-1182, 2003.
[40] J. Loughrey and P. Cunningham, "Overfitting in Wrapper-Based Feature Subset Selection: The Harder You Try the Worse It Gets," Proc. 24th SGAI Int'l Conf. Innovative Techniques and Applications of Artificial Intelligence, pp. 33-43, 2004.
[41] J. Weston, S. Mukherjee, O. Chapelle, M. Pontil, T. Poggio, and V. Vapnik, "Feature Selection for Support Vector Machines," Advances in Neural Information Processing Systems, MIT Press, 2000.
[42] X. Chen, "An Improved Branch and Bound Algorithm for Feature Selection," Pattern Recognition Letters, vol. 24, no. 12, pp. 1925-1933, 2003.
[43] X. Chen and J.C. Jeong, "Minimum Reference Set Based Feature Selection for Small Sample Classifications," Proc. 24th Int'l Conf. Machine Learning, pp. 153-160, 2006.
[44] L. Yu and H. Liu, "Efficient Feature Selection via Analysis of Relevance and Redundancy," J. Machine Learning Research, vol. 5, pp. 1205-1224, 2004.
[45] P. Pudil, J. Novovicova, and J. Kittler, "Floating Search Methods in Feature Selection," Pattern Recognition Letters, vol. 15, pp. 1119-1125, 1994.
[46] O. Lund, M. Nielsen, C. Lundegaard, C. Kesmir, and S. Brunak, Immunological Bioinformatics, pp. 99-101. MIT Press, 2005.
[47] J. Davis and M. Goadrich, "The Relationship between Precision-Recall and ROC Curves," Proc. 23rd Int'l Conf. Machine Learning, pp. 30-38, 2006.
[48] T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloomfield, and E.S. Lander, "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring," Science, vol. 286, pp. 531-537, 1999.
[49] J.V. Hulse, T.M. Khoshgoftaar, and A. Napolitano, "Experimental Perspectives on Learning from Imbalanced Data," Proc. 24th Int'l Conf. Machine Learning, pp. 935-942, 2007.
[50] A. Al Shahib, R. Breitling, and D. Gilbert, "Feature Selection and the Class Imbalance Problem in Predicting Protein Function from Sequence," Applied Bioinformatics, vol. 4, pp. 195-203, 2005.
[51] K. Kira and L. Rendell, "The Feature Selection Problem: Traditional Methods and New Algorithm," Proc. Ninth Int'l Conf. Machine Learning, pp. 249-256, 1992.
[52] I. Kononenko, "Estimating Attributes: Analysis and Extensions of RELIEF," Proc. Seventh European Conf. Machine Learning, pp. 171-182, 1994.
[53] O. Luaces, "MEX Interface for SVMperf,", 2008.
[54] O. Luaces, "SVMperf MATLAB Spider Object,", 2008.
[55] M. Dredze, K. Crammer, and F. Pereira, "Confidence-Weighted Linear Classification," Proc. 25th Int'l Conf. Machine Learning, pp. 264-271, 2008.
[56] T. Hertz, A.B. Hillel, and D. Weinshall, "Learning a Kernel Function for Classification with Small Training Samples," Proc. 23rd Int'l Conf. Machine Learning, pp. 401-408, 2006.
[57] D. Koller and M. Sahami, "Toward Optimal Feature Selection," Proc. 13th Int'l Conf. Machine Learning, pp. 284-292, 1996.
[58] F. Fleuret, "Fast Binary Feature Selection with Conditional Mutual Information," J. Machine Learning Research, vol. 5, pp. 1531-1555, 2004.
[59] H. Peng, F. Long, and C. Ding, "Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 8, pp. 1226-1238, Aug. 2005.
37 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool