The Community for Technology Leaders
RSS Icon
Issue No.01 - January-February (2011 vol.8)
pp: 122-129
Deng-Feng Huang , Xiamen University, Xiamen
Ling-Jun Ye , Xiamen University, Xiamen
Qi-Feng Zhou , Xiamen University, Xiamen
Gui-Fang Shao , Xiamen University, Xiamen
Hong Peng , Xiamen University, Xiamen
The gene expression data are usually provided with a large number of genes and a relatively small number of samples, which brings a lot of new challenges. Selecting those informative genes becomes the main issue in microarray data analysis. Recursive cluster elimination based on support vector machine (SVM-RCE) has shown the better classification accuracy on some microarray data sets than recursive feature elimination based on support vector machine (SVM-RFE). However, SVM-RCE is extremely time-consuming. In this paper, we propose an improved method of SVM-RCE called ISVM-RCE. ISVM-RCE first trains a SVM model with all clusters, then applies the infinite norm of weight coefficient vector in each cluster to score the cluster, finally eliminates the gene clusters with the lowest score. In addition, ISVM-RCE eliminates genes within the clusters instead of removing a cluster of genes when the number of clusters is small. We have tested ISVM-RCE on six gene expression data sets and compared their performances with SVM-RCE and linear-discriminant-analysis-based RFE (LDA-RFE). The experiment results on these data sets show that ISVM-RCE greatly reduces the time cost of SVM-RCE, meanwhile obtains comparable classification performance as SVM-RCE, while LDA-RFE is not stable.
Recursive cluster elimination, gene expression data, feature selection.
Deng-Feng Huang, Ling-Jun Ye, Qi-Feng Zhou, Gui-Fang Shao, Hong Peng, "Improving the Computational Efficiency of Recursive Cluster Elimination for Gene Selection", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.8, no. 1, pp. 122-129, January-February 2011, doi:10.1109/TCBB.2010.44
[1] K.B. Duan, J.C. Rajapakse, H. Wang, and F. Azuaje, "Multiple SVM-RFE for Gene Selection in Cancer Classification with Expression Data," IEEE Trans. Nanobioscience, vol. 4, no. 3, pp. 228-234, Sept. 2005.
[2] X. Zhou and D.P. Tuck, "MSVM-RFE: Extensions of SVM-RFE for Multiclass Gene Selection on DNA Microarray Data," Bioinformatics, vol. 23, pp. 1106-1114, 2006.
[3] S. Deegalla and H. Boström, "Classification of Microarrays with kNN: Comparison of Dimensionality Reduction Methods," Proc. Eighth Int'l Conf. Intelligent Data Eng. and Automated Learning, pp. 800-809, 2007.
[4] I. Inza, P. Larranaga, R. Blanco, and A.J. Cerrolaza, "Filter versus Wrapper Gene Selection Approaches in DNA Microarray Domains," Artificial Intelligence in Medicine, vol. 31, no. 2, pp. 91-103, 2004.
[5] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, "Gene Selection for Cancer Classification Using Support Vector Machines," Machine Learning, vol. 46, nos. 1-3, pp. 389-422, 2002.
[6] Y. Ding and D. Wilkins, "Improving the Performance of SVM-RFE to Select Genes in Microarray Data," BMC Bioinformatics, vol. 7, sup. 2, no. S12, pp. 1-8, July 2006.
[7] M. Yousef, S. Jung, L.C. Showe, and M.K. Showe, "Recursive Cluster Elimination (RCE) for Classification and Feature Selection from Gene Expression Data," BMC Bioinformatics, vol. 8, article no. 144, pp. 1-12, 2007.
[8] J. Weston, A. Elisseeff, B. Schölkopf, and M. Tipping, "Use of the Zero-Norm with Linear Models and Kernel Methods," J. Machine Learning Research, vol. 3, pp. 1439-1461, 2003.
[9] L. Yu and H. Liu, "Efficient Feature Selection via Analysis of Relevance and Redundancy," J. Machine Learning Research, vol. 5, pp. 1205-1224, 2004.
[10] J.H. Hong and S.B. Cho, "Efficient Huge-Scale Feature Selection with Speciated Genetic Algorithm," Pattern Recognition Letters, vol. 27, pp. 143-150, 2006.
[11] X. Zhou, X.Y. Wu, K.Z. Mao, and D.P. Tuck, "Fast Gene Selection for Microarray Data Using SVM-Based Evaluation Criterion," Proc. IEEE Int'l Conf. Bioinformatics and Biomedicine, pp. 386-389, 2008.
[12] R.O. Duda and P.E. Hart, Pattern Classification and Scene Analysis. Wiley, 1973.
[13] J. MacQueen, "Some Methods for Classification and Analysis of Multivariate Observations," Proc. Fifth Berkeley Symp. Math. Statics and Probability, pp. 281-297, 1967.
[14] V. Vapnik, Statistical Learning Theory. Wiley, 1998.
[15] M.P.S. Brown, W.N. Grundy, D. Lin, N. Cristianini, C.W. Sugnet, T.S. Furey, M.A. Jr, and D. Haussler, "Knowledge-Based Analysis of Microarray Gene Expression Data by Using Support Vector Machines," Proc. Nat'l Academy of Sciences USA, vol. 97, no. 1 pp. 262-267, 2000.
[16] T.S. Furey, N. Cristianini, N. Duffy, D.W. Bednarski, M. Schummer, and D. Haussler, "Support Vector Machine Classification and Validation of Cancer Tissue Samples Using Microarray Expression Data," Bioinformatics, vol. 16, no. 10, pp. 906-914, 2000.
[17] Showe Laboratory, http:/, 2009.
[18] U. Alon, N. Barkai, D.A. Notterman, K. Gish, S. Ybarra, D. Mack, and A.J. Levine, "Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Cancer Tissues Probed by Oligonucleotide Arrays," Proc. Nat'l. Academy of Sciences USA, vol. 96, pp. 6745-6750, 1999.
[19] A. Statnikov, I. Tsamardinos, Y. Dosbayev, and C.F. Aliferis http:/, 2009.
[20] A. Statnikov, C.F. Aliferis, and I. Tsamardinos, "Methods for Multi-Category Cancer Diagnosis from Gene Expression Data: A Comprehensive Evaluation to Inform Decision Support System Development," Proc. 11th World Congress on Medical Informatics (MEDINFO), pp. 813-817, 2004.
[21] A. Shashua, "On the Relationship between the Support Vector Machine for Classification and Sparsified Fisher's Linear Discriminant," Neural Processing Letters, vol. 9, pp. 129-139, 1999.
25 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool