Subscribe

Issue No.02 - February (2009 vol.21)

pp: 192-205

Chao-Ton Su , National Tsing Hua University, Hsinchu

Yu-Hsiang Hsiao , National Tsing Hua University, Hsinchu

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2008.128

ABSTRACT

Multiclass Mahalanobis-Taguchi system (MMTS), the extension of MTS, is developed for simultaneous multiclass classification and feature selection. In MMTS, the multiclass measurement scale is constructed by establishing an individual Mahalanobis space for each class. To increase the validity of the measurement scale, the Gram-Schmidt process is performed to mutually orthogonalize the features and eliminate the multicollinearity. The important features are identified using the orthogonal arrays and the signal-to-noise ratio, and are then used to construct a reduced model measurement scale. The contribution of each important feature to classification is also derived according to the effect gain to develop a weighted Mahalanobis distance which is finally used as the distance metric for the classification of MMTS. Using the reduced model measurement scale, an unknown example will be classified into the class with minimum weighted Mahalanobis distance considering only the important features. For evaluating the effectiveness of MMTS, a numerical experiment is implemented, and the results show that MMTS outperforms other well-known algorithms not only on classification accuracy but also on feature selection efficiency. Finally, a real case about gestational diabetes mellitus is studied, and the results indicate the practicality of MMTS in real-world applications.

INDEX TERMS

Classification, feature selection, multiclass problem, Mahalanobis-Taguchi system (MTS), weighted Mahalanobis distance, Gram-Schmidt orthogonalization process, gestational diabetes mellitus.

CITATION

Chao-Ton Su, Yu-Hsiang Hsiao, "Multiclass MTS for Simultaneous Feature Selection and Classification",

*IEEE Transactions on Knowledge & Data Engineering*, vol.21, no. 2, pp. 192-205, February 2009, doi:10.1109/TKDE.2008.128REFERENCES

- [1] G. Taguchi, S. Chowdhury, and Y. Wu,
The Mahalanobis-Taguchi System. McGraw-Hill, 2001.- [2] G. Taguchi and R. Jugulum,
The Mahalanobis-Taguchi Strategy. John Wiley & Sons, 2002.- [3] J. Srinivasaraghavan and V. Allada, “Application of Mahalanobis Distance as a Lean Assessment Metric,”
Int'l J. Advanced Manufacturing Technology, vol. 29, pp. 1159-1168, 2006.- [4] T. Riho, A. Suzuki, J. Oro, K. Ohmi, and H. Tanaka, “The Yield Enhancement Methodology for Invisible Defects Using the MTS+ Method,”
IEEE Trans. Semiconductor Manufacturing, vol. 18, no. 4, pp. 561-568, 2005.- [5] P. Das and S. Datta, “Exploring the Effects of Chemical Composition in Hot Rolled Steel Product Using Mahalanobis Distance Scale under Mahalanobis-Taguchi System,”
Computational Materials Science, vol. 38, no. 4, pp. 671-677, 2007.- [6] G. Taguchi, S. Chowdhury, and Y. Wu,
Taguchi's Quality Engineering Handbook. Wiley, 2005.- [7] C.T. Su and Y.H. Hsiao, “An Evaluation of the Robustness of MTS for Imbalanced Data,”
IEEE Trans. Knowledge and Data Eng., vol. 19, no. 10, pp. 1321-1332, Oct. 2007.- [8] Y.L. Cun, B. Boser, J. Denker, D. Hendersen, R. Howard, W. Hubbard, and L. Jackel, “Backpropagation Applied to Handwritten Zip Code Recognition,”
Neural Computation, vol. 1, pp.541-551, 1989.- [9] O. Chapelle, P. Haffner, and V.N. Vapnik, “Support Vector Machines for Histogram-Based Image Classification,”
IEEE Trans. Neural Networks, vol. 10, no. 5, pp. 1055-1064, 1999.- [10] M. Shami and W. Verhelst, “An Evaluation of the Robustness of Existing Supervised Machine Learning Approaches to the Classification of Emotions in Speech,”
Speech Comm., vol. 49, no. 3, pp. 201-212, 2007.- [11] S. Thamarai Selvi, S. Arumugam, and L. Ganesan, “BIONET: An Artificial Neural Network Model for Diagnosis of Diseases,”
Pattern Recognition Letters, vol. 21, no. 8, pp. 721-740, 2000.- [12] W. Lam, M. Ruiz, and P. Srinivasan, “Automatic Text Categorization and Its Application to Text Retrieval,”
IEEE Trans. Knowledge and Data Eng., vol. 11, no. 6, pp. 865-879, Nov./Dec. 1999.- [13] C.W.D. Justin and R.J. Victor, “Feature Subset Selection with a Simulated Annealing Data Mining Algorithm,”
J. Intelligent Information Systems, vol. 9, pp. 57-81, 1997.- [14] B. Walczk and D.L. Massart, “Rough Sets Theory,”
Chemometrics and Intelligent Laboratory Systems, vol. 47, pp. 1-16, 1999.- [15] R.A. Johnson and D.W. Wichern,
Applied Multivariate Statistical Analysis. Prentice-Hall, 1998.- [16] H. Kim and G.J. Koehler, “Theory and Practice of Decision Tree Induction,”
Omega, vol. 23, no. 6, pp. 637-652, 1995.- [17] B. Schölkopf and A.J. Smola,
Learning with Kernels. The MIT Press, 2002.- [18] C.W. Hsu and C.J. Lin, “A Comparison of Methods for Multiclass Support Vector Machines,”
IEEE Trans. Neural Networks, vol. 13, no. 2, pp. 415-425, 2002.- [19] M.E. Tipping, “Sparse Bayesian Learning and the Relevance Vector Machine,”
J. Machine Learning Research, vol. 1, pp. 211-244, 2001.- [20] I.W. Tsang, J.T. Kwok, and P.M. Cheung, “Core Vector Machines: Fast SVM Training on Very Large Data Sets,”
J. Machine Learning Research, vol. 6, pp. 363-392, 2005.- [21] D.W. Patterson,
Artificial Neural Networks: Theory and Applications. Prentice Hall, 1996.- [22] J. Weston and C. Watkins, “Multi-Class Support Vector Machines,” Technical Report CSD-TR-98-04, London, Egham, TW20 0EX, UK, 1998.
- [23] V.N. Vapnik,
Statistical Learning Theory. Wiley, 1998.- [24] S. Asharaf, M.N. Murty, and S.K. Shevade, “Multiclass Core Vector Machine,”
Proc. 24th Int'l Conf. Machine Learning (ICML), 2007.- [25] H. Zhang and J. Malik, “Selecting Shape Features Using Multi-Class Relevance Vector Machine,” Technical Report UCB/EECS-2005-6, Dept. of Electrical Eng. and Computer Sciences, Univ. of California, Berkeley, 2005.
- [26] U.H.G. KreBel, “Pairwise Classification and Support Vector Machines,”
Advances in Kernel Methods: Support Vector Learning. pp. 255-268, MIT Press, 1999.- [27] G. Ou and Y.L. Murphey, “Multi-Class Pattern Classification Using Neural Networks,”
Pattern Recognition, vol. 40, no. 1, pp. 4-18, 2007.- [28] T.G. Dietterich and G. Bakiri, “Solving Multiclass Learning Problem via Error-Correcting Output Codes,”
J. Artificial Intelligence Research, vol. 2, pp. 263-286, 1995.- [29] J. Wu, J.G. Zhou, and P.L. Yan, “Incremental Proximal Support Vector Classifier for Multi-Class Classification,”
Proc. Int'l Conf. Machine Learning and Cybernetics (ICMLC '04), vol. 5, pp. 3201-3206, 2004.- [30] Y. Tian, Z. Qi, and N. Deng, “A New Support Vector Machine for Multi-Class Classification,”
Proc. Fifth Int'l Conf. Computer and Information Technology (ICCIT '05), pp. 18-22, 2005.- [31] R. Anand, K. Mehrotra, C.K. Mohan, and S. Ranka, “Efficient Classification for Multiclass Problems Using Modular Neural Networks,”
IEEE Trans. Neural Networks, vol. 6, no. 1, pp. 117-124, 1995.- [32] R. Duda, P. Hart, and D. Stork,
Pattern Classification. Wiley, 2001.- [33] F. Masulli and G. Valentini, “Effectiveness of Error Correcting Output Codes in Multiclass Learning Problems,”
LNCS 1857, pp.107-116, 2000.- [34] W.H. Woodall, R. Koudelik, K.L. Tsui, S.B. Kim, Z.G. Stoumbos, and C.P. Carvounis, “A Review and Analysis of the Mahalanobis-Taguchi System,”
Technometrics, vol. 45, no. 1, pp. 1-15, 2003.- [35] A. Kalousis, J. Prados, and M. Hilario, “Stability of Feature Selection Algorithms,”
Proc. Fifth IEEE Int'l Conf. Data Mining (ICDM), 2005.- [36] T.K. Ho and M. Basu, “Complexity Measures of Supervised Classification Problems,”
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 3, pp. 289-300, Mar. 2002.- [37] C.C. Chang and C.J. Lin,
LIBSVM: A Library for Support Vector Machines, http://www.csie.ntu.edu.tw/~cjlinlibsvm, 2001.- [38] R. Tibshirani, “Regression Shrinkage and Selection via the Lasso,”
J. Royal Statistical Soc. Series B, vol. 58, no. 1, pp. 267-288, 1996.- [39] B.E. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, “Least Angle Regression,”
The Annals of Statistics, vol. 32, pp. 407-451, 2004.- [40] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene Selection for Cancer Classification Using Support Vector Machines,”
Machine Learning, vol. 46, pp. 389-422, 2002.- [41] A. Rakotomamonjy, “Variable Selection Using SVM-based Criteria,”
J. Machine Learning Research, vol. 3, pp. 1357-1370, 2003.- [42] L. Shih, J.D.M. Rennie, Y.H. Chang, and D.R. Karger, “Text Bundling: Statistics-Based Data Reduction,”
Proc. 20th Int'l Conf. Machine Learning (ICML), 2003.- [43] X.B. Li, “Data Reduction via Adaptive Sampling,”
Comm. Information and Systems, vol. 2, no. 1, pp. 53-68, 2002.- [44] H. Liu and H. Mtotda,
Instance Selection and Construction for Data Mining. Kluwer Academic Publishers, 2001.- [45] N.H. Cho, H.C. Jang, H.K. Park, and Y.W. Cho, “Waist Circumference Is the Key Risk Factor for Diabetes in Korean Women with History of Gestational Diabetes,”
Diabetes Research and Clinical Practice, vol. 71, no. 2, pp. 177-183, 2006.- [46] M.K. Barger and M. Bidgood-Wilson, “Caring for a Woman at High Risk for Type 2 Diabetes,”
J. Midwifery and Women's Health, vol. 51, no. 3, pp. 222-226, 2006.- [47] B.E. Metzger, N.H. Cho, S.M. Rston, and R. Radvany, “Pregnancy Weight and Antepartum Insulin Secretion Predict Glucose Tolerance Five Years after Gestational Diabetes Mellitus,”
Diabetes Care, vol. 16, pp. 1598-1605, 1993.- [48] S.L. Kjos, R.K. Peters, A. Xiang, O.A. Henry, M. Montoro, and T.A. Buchanan, “Predicting Future Diabetes in Latino Women with Gestational Diabetes,”
Diabetes, vol. 44, pp. 586-591, 1995. |