Issue No. 04 - October-December (2010 vol. 7)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.8
Qiang Cheng , Southern Illinois University, Carbondale
Extracting features from high-dimensional data is a critically important task for pattern recognition and machine learning applications. High-dimensional data typically have much more variables than observations, and contain significant noise, missing components, or outliers. Features extracted from high-dimensional data need to be discriminative, sparse, and can capture essential characteristics of the data. In this paper, we present a way to constructing multivariate features and then classify the data into proper classes. The resulting small subset of features is nearly the best in the sense of Greenshtein's persistence; however, the estimated feature weights may be biased. We take a systematic approach for correcting the biases. We use conjugate gradient-based primal-dual interior-point techniques for large-scale problems. We apply our procedure to microarray gene analysis. The effectiveness of our method is confirmed by experimental results.
High-dimensional data, feature selection, persistence, bias, convex optimization, primal-dual interior-point optimization, cancer classification, microarray gene analysis.
Qiang Cheng, "A Sparse Learning Machine for High-Dimensional Data with Application to Microarray Gene Analysis", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 7, no. , pp. 636-646, October-December 2010, doi:10.1109/TCBB.2009.8