The Community for Technology Leaders
Green Image
Issue No. 01 - January-March (2010 vol. 7)
ISSN: 1545-5963
pp: 100-107
Ming T. Tan , University of Maryland, Baltimore
Zhenqiu Liu , University of Maryland, Baltimore
Shili Lin , Ohio State University, Columbus
The development of high-throughput technology has generated a massive amount of high-dimensional data, and many of them are of discrete type. Robust and efficient learning algorithms such as LASSO [1] are required for feature selection and overfitting control. However, most feature selection algorithms are only applicable to the continuous data type. In this paper, we propose a novel method for sparse support vector machines (SVMs) with L_{p} (p < 1) regularization. Efficient algorithms (LpSVM) are developed for learning the classifier that is applicable to high-dimensional data sets with both discrete and continuous data types. The regularization parameters are estimated through maximizing the area under the ROC curve (AUC) of the cross-validation data. Experimental results on protein sequence and SNP data attest to the accuracy, sparsity, and efficiency of the proposed algorithm. Biomarkers identified with our methods are compared with those from other methods in the literature. The software package in Matlab is available upon request.
Embedded method, feature selection, L_{p} regularization, SVM, SNP data analysis, protease data analysis.
Ming T. Tan, Zhenqiu Liu, Shili Lin, "Sparse Support Vector Machines with L_{p} Penalty for Biomarker Identification", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 7, no. , pp. 100-107, January-March 2010, doi:10.1109/TCBB.2008.17
100 ms
(Ver )