Issue No. 01 - January-February (2011 vol. 8)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2010.43
K.Z. Mao , Nanyang Technological University, Singapore
Wenyin Tang , Nanyang Technological University, Singapore
Mahalanobis class separability measure provides an effective evaluation of the discriminative power of a feature subset, and is widely used in feature selection. However, this measure is computationally intensive or even prohibitive when it is applied to gene expression data. In this study, a recursive approach to Mahalanobis measure evaluation is proposed, with the goal of reducing computational overhead. Instead of evaluating Mahalanobis measure directly in high-dimensional space, the recursive approach evaluates the measure through successive evaluations in 2D space. Because of its recursive nature, this approach is extremely efficient when it is combined with a forward search procedure. In addition, it is noted that gene subsets selected by Mahalanobis measure tend to overfit training data and generalize unsatisfactorily on unseen test data, due to small sample size in gene expression problems. To alleviate the overfitting problem, a regularized recursive Mahalanobis measure is proposed in this study, and guidelines on determination of regularization parameters are provided. Experimental studies on five gene expression problems show that the regularized recursive Mahalanobis measure substantially outperforms the nonregularized Mahalanobis measures and the benchmark recursive feature elimination (RFE) algorithm in all five problems.
Gene selection, recursive Mahalanobis measure, regularized Mahalanobis measure.
W. Tang and K. Mao, "Recursive Mahalanobis Separability Measure for Gene Subset Selection," in IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 8, no. , pp. 266-272, 2010.