Subscribe
Issue No.04 - October-December (2010 vol.7)
pp: 636-646
Qiang Cheng , Southern Illinois University, Carbondale
ABSTRACT
Extracting features from high-dimensional data is a critically important task for pattern recognition and machine learning applications. High-dimensional data typically have much more variables than observations, and contain significant noise, missing components, or outliers. Features extracted from high-dimensional data need to be discriminative, sparse, and can capture essential characteristics of the data. In this paper, we present a way to constructing multivariate features and then classify the data into proper classes. The resulting small subset of features is nearly the best in the sense of Greenshtein's persistence; however, the estimated feature weights may be biased. We take a systematic approach for correcting the biases. We use conjugate gradient-based primal-dual interior-point techniques for large-scale problems. We apply our procedure to microarray gene analysis. The effectiveness of our method is confirmed by experimental results.
INDEX TERMS
High-dimensional data, feature selection, persistence, bias, convex optimization, primal-dual interior-point optimization, cancer classification, microarray gene analysis.
CITATION
Qiang Cheng, "A Sparse Learning Machine for High-Dimensional Data with Application to Microarray Gene Analysis", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.7, no. 4, pp. 636-646, October-December 2010, doi:10.1109/TCBB.2009.8
REFERENCES
 [1] D. Ghosh, "Singular Value Decomposition Regression Modeling for Classification of Tumors from Microarray Experiments," Proc. Pacific Symp. Biocomputing, pp. 11-462-11-467, 2002. [2] J. Fan and Y. Fan, "High Dimensional Classification Using Features Annealed Independence Rules," Annals of Statistics, vol. 36, pp. 2605-2637, 2008. [3] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, "Gene Selection for Cancer Classification Using Support Vector Machines," Machine Learning, vol. 46, pp. 389-422, 2002. [4] V.N. Vapnik, Statistical Learning Theory. Wiley Interscience 1998. [5] V.N. Vapnik, The Nature of Statistical Learning Theory. Springer, 1999. [6] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning. Springer, 2001. [7] T. Furey, N. Cristianini, N. Duffy, D. Bednarski, M. Schummer, and D. Haussler, "Support Vector Machine Classification and Validation of Cancer Tissue Samples Using Microarray Expression Data," Bioinformatics, vol. 16, pp. 906-914, 2000. [8] I.T. Jolliffe, Principal Component Analysis. Springer-Verlag, 1986. [9] R. Tibshirani, T. Hastie, B. Narasimhan, and G. Chu, "Diagnosis of Multiple Cancer Types by Shrunken Centroids of Gene Expression," Proc. Nat'l Academy of Sciences USA, vol. 99, pp. 6567-6572, 2002. [10] P. Bickel and E. Levina, "Some Theory of Fisher's Linear Discriminant Function, 'Naive Bayes', and Some Alternatives Where There are Many More Variables Than Observations," Bernoulli, vol. 10, pp. 989-1010, 2004. [11] M. Wu, B. Scholkopf, and G. Bakir, "A Direct Method for Building Sparse Kernel Learning Algorithms," J. Machine Learning Research, vol. 7, pp. 603-624, 2006. [12] J. Zhu, S. Rosset, T. Hastie, and R. Tibshirani, "1-Norm Support Vector Machines," Proc. Neural Information Processing Systems, 2003. [13] J. Fan and R. Li, "Statistical Challenges with High Dimensionality: Feature Selection in Knowledge Discovery," Proc. Int'l Congress of Mathematicians, 2006. [14] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge Univ. Press, 2004. [15] H. Liu and H. Motoda, Feature Selection for Knowledge Discovery and Data Mining. Springer, 1998. [16] P. Geladi and B. Kowalski, "Partial Least-Squares Regression: A Tutorial," Analytica Chemica Acta, vol. 185, pp. 1-17, 1986. [17] M. Barker and W. Rayens, "Partial Least Squares for Discrimination," J. Chemometrics, vol. 17, no. 3, pp. 166-173, 2003. [18] K.-C. Li, "Sliced Inverse Regression for Dimension Reduction (with Discussion)," J. Am. Statistical Assoc., vol. 86, pp. 316-342, 1991. [19] E. Bair, T. Hastie, P. Debashis, and R. Tibshirani, "Prediction by Supervised Principal Components," J. Am. Statistical Assoc., vol. 101, pp. 119-137, 2006. [20] D. Nguyen and D. Rocke, "Tumor Classification by Partial Least Squares Using Microarray Gene Expression Data," Bioinformatics, vol. 18, pp. 39-50, 2002. [21] X. Huang and W. Pan, "Linear Regression and Two-Class Classification with Gene Expression Data," Bioinformatics, vol. 19, pp. 2072-2078, 2003. [22] F. Chiaromonte and J. Martinelli, "Dimension Reduction Strategies for Analyzing Global Gene Expression Data with a Response," Math. Biosciences, vol. 176, pp. 123-144, 2002. [23] A. Antoniadis, S. Lambert-Lacroix, and F. Leblanc, "Effective Dimension Reduction Methods for Tumor Classification Using Gene Expression Data," Bioinformatics, vol. 19, pp. 563-570, 2003. [24] E. Bura and R. Pfeiffer, "Graphical Methods for Class Prediction Using Dimension Reduction Techniques on DNA Microarray Data," Bioinformatics, vol. 19, pp. 1252-1258, 2003. [25] A. Martinez and A. Kak, "PCA versus LDA," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 2, pp. 228-233, Feb. 2001. [26] T. Golub et al., "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring," Science, vol. 286, pp. 531-537, http://www.broad.mit.edu/ cgi-bin/ cancerdata sets.cgi, 1999. [27] L. Shen and E. Tan, "PLS and SVD Based Penalized Logistic Regression for Cancer Classification Using Microarray Data," Proc. Third Asia-Pacific Bioinformatics Conf., P. Chen and L. Wong, eds., pp. 219-228, Jan. 2005. [28] S. vg, D. Slocaj, and A. Leonardis, "Combining Reconstructive and Discriminative Subspace Methods for Robust Classification and Regression by Subsampling," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 3, pp. 337-350, Mar. 2006. [29] E. Greenshtein, "Best Subset Selection, Persistence in High-Dimensional Statistical Learning and Optimization under $l_1$ Constraint," Annals of Statistics, vol. 34, no. 5, pp. 2367-2386, 2006. [30] D. Foster and E. George, "The Risk Inflation Criterion for Multiple Regression," Annals of Statistics, vol. 22, pp. 1947-1975, 1994. [31] D. Donoho and X. Huo, "Uncertainty Principles and Ideal Atomic Decomposition," IEEE Trans. Information Theory, vol. 47, no. 7, pp. 2845-2862, Nov. 2001. [32] M. Elad and A.M. Bruckstein, "A Generalized Uncertainty Principle and Sparse Representation in Pairs of Bases," IEEE Trans. Information Theory, vol. 48, no. 9, pp. 2558-2567, Sept. 2002. [33] D. Donoho, "For Most Large Underdetermined Systems of Linear Equations the Minimal $l_1$ -Norm Solution is Also the Sparsest Solution," Comm. Pure and Applied Math., vol. 59, no. 6, pp. 797-829, 2006. [34] P. Rousseeuw and A. Leroy, Robust Regression and Outlier Detection. Wiley, 1987. [35] S. Dudoit, J. Fridlyand, and T. Speed, "Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data," J. Am. Statistical Assoc., vol. 97, pp. 77-87, 2002. [36] D. Singh et al., "Gene Expression Correlates of Clinical Prostate Cancer Behavior," Cancer Cell, vol. 1, pp. 203-209, http://www.broad.mit.edu/cgi-bin/cancerdata sets.cgi , 2002. [37] J. Welsh et al., "Analysis of Gene Expression Identifies Candidate Markers and Pharmacological Targets in Prostate Cancer," Cancer Research, vol. 61, pp. 5974-5978, 2001. [38] http:/www.chestsurg.org, 2009.