This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
An Evaluation of the Robustness of MTS for Imbalanced Data
October 2007 (vol. 19 no. 10)
pp. 1321-1332

Abstract—In classification problems, class imbalance problem will cause bias on the training of classifiers, and will result in the lower sensitivity of detecting the minority class examples. Mahalabobis-Taguchi System (MTS) is a diagnosis and forecasting technique for multivariate data. MTS establishes a classifier by constructing a continuous measurement scale rather than directly learning from the training set. Therefore, it is expected that the construction of an MTS model will not be influenced by data distribution, and this property is helpful to overcome the class imbalance problem. To verify the robustness of MTS for imbalanced data, this study compares MTS with several popular classification techniques. The results indicate that MTS is the most robust technique to deal with the classification problem on imbalanced data. In addition, this study develops a "probabilistic thresholding method" to determine the classification threshold for MTS, and it obtains a good performance. Finally, MTS is employed to analyze the RF inspection process of mobile phone manufacture. The data collected from the RF inspection process is typically an imbalanced type. Implementation results show that the inspection attributes are significantly reduced and that the RF inspection process can also maintain high inspection accuracy.

[1] N. Japkowicz, “Learning from Imbalanced Data Sets: A Comparison of Various Strategies,” Proc. Assoc. Advancement of Artificial Intelligence Workshop Learning from Imbalanced Data Sets (AAAI '00) pp. 10-15, 2000.
[2] N. Japkowicz and S. Stephen, “The Class Imbalance Problem: A Systematic Study,” Intelligent Data Analysis, vol. 6, no. 5, pp. 429-450, 2002.
[3] C. Phua, D. Alahakoon, and V. Lee, “Minority Report in Fraud Detection: Classification of Skewed Data,” SIGKDD Explorations, vol. 6, no. 1, pp. 50-59, 2004.
[4] N.V. Chawla, K. Bowyer, L. Hall, and W. Kegelmeyer, “SMOTE: Synthetic Minority Over-Sampling Technique,” J. Artificial Intelligence Research, vol. 16, pp. 231-357, 2002.
[5] J.W. Grzymala-Busse, J. Stefanowski, and S. Wilk, “A Comparison of Two Approaches to Data Mining from Imbalanced Data,” Lecture Notes in Computer Science, vol. 3213, pp. 757-763, 2004.
[6] P.C. Pendharkar, J.A. Rodger, G.J. Yaverbaum, N. Herman, and M. Benner, “Association, Statistical, Mathematical and Neural Approaches for Mining Breast Cancer Patterns,” Expert System with Applications, vol. 17, pp. 223-232, 1993.
[7] M.A. Maloof, “Learning When Data Sets Are Imbalanced and When Costs Are Unequal and Unknown,” Proc. 20th Int'l Conf. Machine Learning Workshop Learning from Imbalanced Data Sets II (ICML '03), 2003.
[8] G. Batista, R.C. Prati, and M.C. Monard, “A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data,” SIGKDD Explorations, vol. 6, no. 1, pp. 20-29, 2004.
[9] H. Guo and H.L. Viktor, “Learning from Imbalanced Data Sets with Boosting and Data Generation: The Databoost-IM Approach,” SIGKDD Explorations, vol. 6, no. 1, pp. 30-39, 2004.
[10] N.V. Chawla, N. Japkowicz, and A. Kolcz, “Editorial: Special Issue on Learning from Imbalanced Data Sets,” SIGKDD Explorations, vol. 6, no. 1, pp. 1-6, 2004.
[11] K. Huang, H. Yang, I. King, and M. Lyu, “Learning Classifiers from Imbalanced Data Based on Biased Minimax Probability Machine,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition (CVPR '04), pp. 558-563, 2004.
[12] J. Zhang and I. Mani, “kNN Approach to Unbalanced Data Distributions: A Case Study Involving Information Extraction,” Proc. 20th Int'l Conf. Machine Learning Workshop Learning from Imbalanced Data Sets (ICML '03), 2003.
[13] G. Taguchi, S. Chowdhury, and Y. Wu, The Mahalanobis-Taguchi System. McGraw-Hill, 2001.
[14] G. Taguchi and R. Jugulum, The Mahalanobis-Taguchi Strategy. John Wiley & Sons, 2002.
[15] W.H. Woodall, R. Koudelik, K.L. Tsui, S.B. Kim, Z.G. Stoumbos, and C.P. Carvounis, “A Review and Analysis of the Mahalanobis-Taguchi System,” Technometrics, vol. 45, no. 1, pp. 1-15, 2003.
[16] A. Bovas and V. Asokan Mulayath, “Discussion—A Review and Analysis of the Mahalanobis-Taguchi System,” Technometrics, vol. 45, no. 1, pp. 22-25, 2003.
[17] D.M. Hawkins, “Discussion—A Review and Analysis of the Mahalanobis-Taguchi System,” Technometrics, vol. 45, no. 1, pp.25-29, 2003.
[18] J. Rajesh, G. Taguchi, and S. Taguchi, “Discussion—A Review and Analysis of the Mahalanobis-Taguchi System,” Technometrics, vol. 45, no. 1, pp. 16-21, 2003.
[19] J. Srinivasaraghavan and V. Allada, “Application of Mahalanobis Distance as a Lean Assessment Metric,” Int'l J. Advanced Manufacturing Technology, vol. 29, pp. 1159-1168, 2006.
[20] T. Riho, A. Suzuki, J. Oro, K. Ohmi, and H. Tanaka, “The Yield Enhancement Methodology for Invisible Defects Using the MTS+ Method,” IEEE Trans. Semiconductor Manufacturing, vol. 18, no. 4, pp. 561-568, 2005.
[21] P. Das and S. Datta, “Exploring the Effects of Chemical Composition in Hot Rolled Steel Product Using Mahalanobis Distance Scale under Mahalanobis-Taguchi System,” Computational Materials Science, 2006.
[22] M. Kubat and S. Matwin, “Addressing the Curse of Imbalanced Training Set: One-Sided Selection,” Proc. 14th Int'l Conf. Machine Learning (ICML '97), 1997.
[23] G. Wu and E.Y. Chang, “KBA: Kernel Boundary Alignment Considering Imbalanced Data Distribution,” IEEE Trans. Knowledge and Data Eng., vol. 17, no. 6, pp. 786-794, June 2005.
[24] R. Akbani, S. Kwek, and N. Japkowicz, “Applying Support Vector Machines to Imbalanced Datasets,” Proc. 15th European Conf. Machine Learning (ECML '04), pp. 39-50, 2004.
[25] T.K. Ho and M. Basu, “Complexity Measures of Supervised Classification Problems,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 3, pp. 289-300, Mar. 2002.
[26] T. Evgeniou, M. Pontil, C. Papageorgiou, and T. Poggio, “Image Representations and Feature Selection for Multimedia Database Search,” IEEE Trans. Knowledge and Data Eng., vol. 15, no. 4, pp.911-920, July/Aug. 2003.
[27] A. Sun, E.P. Lim, W.K. Ng, and J. Srivastava, “Blocking Reduction Strategies in Hierarchical Text Classification,” IEEE Trans. Knowledge and Data Eng., vol. 16, no. 10, pp. 1305-1308, Oct. 2004.
[28] G. Wu and E. Chang, “Class-Boundary Alignment for Imbalanced Dataset Learning,” Proc. 20th Int'l Conf. Machine Learning Workshop Learning from Imbalanced Data Sets II (ICML '03), 2003.
[29] G. Wu and E. Chang, “Adaptive Feature-Space Conformal Transformation for Imbalanced Data Learning,” Proc. 20th Int'l Conf. Machine Learning (ICML '03), pp. 816-823, 2003.
[30] C.T. Su, L.S. Chen, and T.L. Chiang, “A Neural Network Based Information Granulation Approach to Shorten the Cellular Phone Test Process,” Computers in Industry, vol. 57, no. 5, pp. 379-390, 2006.

Index Terms:
Data mining, classification, class imbalance problem, imbalanced data, Mahalanobis-Taguchi System (MTS), threshold, mobile phone inspection
Citation:
Chao-Ton Su, Yu-Hsiang Hsiao, "An Evaluation of the Robustness of MTS for Imbalanced Data," IEEE Transactions on Knowledge and Data Engineering, vol. 19, no. 10, pp. 1321-1332, Oct. 2007, doi:10.1109/TKDE.2007.190623
Usage of this product signifies your acceptance of the Terms of Use.