The Community for Technology Leaders
Green Image
Extracting a subset of informative genes from microarray expression data is a critical data preparation step in cancer classification and other biological function analyses. Though many algorithms have been developed, the Support Vector Machine - Recursive Feature Elimination (SVM-RFE) algorithm is one of the best gene feature selection algorithms. It assumes that a smaller "filter-out" factor in the SVM-RFE, which results in a smaller number of gene features eliminated in each recursion, should lead to extraction of a better gene subset. Because the SVM-RFE is highly sensitive to the "filter-out" factor, our simulations have shown that this assumption is not always correct and that the SVM-RFE is an unstable algorithm. To select a set of key gene features for reliable prediction of cancer types or subtypes and other applications, a new two-stage SVM-RFE algorithm has been developed. It is designed to effectively eliminate most of the irrelevant, redundant and noisy genes while keeping information loss small at the first stage. A fine selection for the final gene subset is then performed at the second stage. The two-stage SVM-RFE overcomes the instability problem of the SVM-RFE to achieve better algorithm utility. We have demonstrated that the two-stage SVM-RFE is significantly more accurate and more reliable than the SVM-RFE and three correlation-based methods based on our analysis of three publicly available microarray expression datasets. Furthermore, the two-stage SVM-RFE is computationally efficient because its time complexity is $O(d * \log{_2d})$, where $d$ is the size of the original gene set.
Bioinformatics, Microarray Gene Expression Data Analysis, Cancer Classification, Support Vector Machines, Gene Selection, Feature Selection, Recursive Feature Elimination

Z. Huang, Y. Tang and Y. Zhang, "Development of Two-Stage SVM-RFE Gene Selection Strategy for Microarray Expression Data Analysis," in IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 4, no. , pp. 365-381, 2007.
84 ms
(Ver 3.3 (11022016))