The Community for Technology Leaders
Green Image
Issue No. 01 - Jan. (2016 vol. 28)
ISSN: 1041-4347
pp: 238-251
Lida Abdi , Department of Computer Science and Engineering, School of Electrical and Computer Engineering, Shiraz University, Shiraz, Iran
Sattar Hashemi , Department of Computer Science and Engineering, School of Electrical and Computer Engineering, Shiraz University, Shiraz, Iran
ABSTRACT
Class imbalance problem is quite pervasive in our nowadays human practice. This problem basically refers to the skewness in the data underlying distribution which, in turn, imposes many difficulties on typical machine learning algorithms. To deal with the emerging issues arising from multi-class skewed distributions, existing efforts are mainly divided into two categories: model-oriented solutions and data-oriented techniques. Focusing on the latter, this paper presents a new over-sampling technique which is inspired by Mahalanobis distance. The presented over-sampling technique, called MDO (Mahalanobis Distance-based Over-sampling technique), generates synthetic samples which have the same Mahalanobis distance from the considered class mean as other minority class examples. By preserving the covariance structure of the minority class instances and intelligently generating synthetic samples along the probability contours, new minority class instances are modelled better for learning algorithms. Moreover, MDO can reduce the risk of overlapping between different class regions which are considered as a serious challenge in multi-class problems. Our theoretical analyses and empirical observations across wide spectrum multi-class imbalanced benchmarks indicate that MDO is the method of choice by offering statistical superior MAUC and precision compared to the popular over-sampling techniques.
INDEX TERMS
Mathematical model, Training, Accuracy, Eigenvalues and eigenfunctions, Machine learning algorithms, Algorithm design and analysis, Benchmark testing
CITATION

L. Abdi and S. Hashemi, "To Combat Multi-Class Imbalanced Problems by Means of Over-Sampling Techniques," in IEEE Transactions on Knowledge & Data Engineering, vol. 28, no. 1, pp. 238-251, 2016.
doi:10.1109/TKDE.2015.2458858
233 ms
(Ver 3.3 (11022016))