Experimental analysis of feature selection stability for high-dimension and low-sample size gene expression classification task
2012 IEEE 12th International Conference on Bioinformatics & Bioengineering (BIBE) (2012)
Larnaca, Cyprus Cyprus
Nov. 11, 2012 to Nov. 13, 2012
Blaise Hanczar , LIPADE, Université Paris Descartes, 45 rue des Saint-Pères, Paris, F-75006 France
Gene selection is a crucial step when building a classifier from microarray or metagenomic data. As the number of observations is small, the gene selection tends to be unstable. It is common that two gene subsets, obtained from different datasets but dealing with the same classification problem, do not overlap significantly. Although it is a crucial problem, few works have been done on the selection stability. In this paper, we first present some stability quantification methods, then we study the variations of those measures with various parameters (dimensionality, sample size, feature distribution, selection threshold) on both artificial and real data, as well as the resulting classification performance. Feature selection was performed with t-test and classification with linear discriminant analysis. We point out a strong empiric correlation between the dimensionality/sample size ratio and selection instability.
Training, Stability criteria, Correlation, Indexes, Size measurement, Error analysis, dimensionality/sample size ratio, Feature selection, small sample, stability
B. Hanczar, "Experimental analysis of feature selection stability for high-dimension and low-sample size gene expression classification task," 2012 IEEE 12th International Conference on Bioinformatics & Bioengineering (BIBE), Larnaca, Cyprus Cyprus, 2012, pp. 350-355.