Issue No. 02 - April-June (2005 vol. 2)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2005.28
<p><b>Abstract</b>—Class prediction and feature selection are two learning tasks that are strictly paired in the search of molecular profiles from microarray data. Researchers have become aware how easy it is to incur a selection bias effect, and complex validation setups are required to avoid overly optimistic estimates of the predictive accuracy of the models and incorrect gene selections. This paper describes a semisupervised pattern discovery approach that uses the by-products of complete validation studies on experimental setups for gene profiling. In particular, we introduce the study of the patterns of single sample responses (sample-tracking profiles) to the gene selection process induced by typical supervised learning tasks in microarray studies. We originate sample-tracking profiles as the aggregated off-training evaluation of SVM models of increasing gene panel sizes. Genes are ranked by E-RFE, an entropy-based variant of the recursive feature elimination for support vector machines (RFE-SVM). A Dynamic Time Warping (DTW) algorithm is then applied to define a metric between sample-tracking profiles. An unsupervised clustering based on the DTW metric allows automating the discovery of outliers and of subtypes of different molecular profiles. Applications are described on synthetic data and in two gene expression studies.</p>
Machine learning, data mining, classifier design and evaluation, feature evaluation and selection, pattern analysis, clustering, similarity measures, biology and genetics, bioinformatics databases.
Giuseppe Jurman, Maria Serafini, Stefano Merler, Cesare Furlanello, "Semisupervised Learning for Molecular Profiling", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 2, no. , pp. 110-118, April-June 2005, doi:10.1109/TCBB.2005.28