This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Dimension Reduction-Based Penalized Logistic Regression for Cancer Classification Using Microarray Data
April-June 2005 (vol. 2 no. 2)
pp. 166-175

Abstract—The use of penalized logistic regression for cancer classification using microarray expression data is presented. Two dimension reduction methods are respectively combined with the penalized logistic regression so that both the classification accuracy and computational speed are enhanced. Two other machine-learning methods, support vector machines and least-squares regression, have been chosen for comparison. It is shown that our methods have achieved at least equal or better results. They also have the advantage that the output probability can be explicitly given and the regression coefficients are easier to interpret. Several other aspects, such as the selection of penalty parameters and components, pertinent to the application of our methods for cancer classification are also discussed.

[1] P.O. Brown and D. Botstein, “Exploring the New World of the Genome with DNA Microarrays,” Nature Genetics Supplement, vol. 21, pp. 33-37, Jan. 1999.
[2] C. Debouck and P.N. Goodfellow, “DNA Microarrays in Drug Discovery and Development,” Nature Genetics Supplement, vol. 21, pp. 48-50, Jan. 1999.
[3] D.J. Duggan et al., “Expression Profiling Using cDNA Microarrays,” Nature Genetics Supplement, vol. 21, pp. 10-14, Jan. 1999.
[4] C. Peterson and M. Ringnér, “Analyzing Tumor Gene Expression Profiles,” Artificial Intelligence in Medicine, vol. 28, no. 1, pp. 59-74, May 2003.
[5] T.S. Furey et al., “Support Vector Machine Classification and Validation of Cancer Tissue Samples Using Microarray Expression Data,” Bioinformatics, vol. 16, no. 10, pp. 906-914, 2000.
[6] P.H.C. Eilers et al., “Classification of Microarray Data with Penalized Logistic Regression,” Proc. SPIE, vol. 4266, no. 2, pp. 187-198, 2001.
[7] M.G. Schimek, “Penalized Logistic Regression in Gene Expression Analysis,” Proc. The Art of Semiparametrics Conf., http://apus.wiwi.hu-berlin.de/statistik/ aos2003/schimekschimek.pdf, Oct. 2003.
[8] J. Zhu and T. Hastie, “Classification of Gene Microarrays by Penalized Logistic Regression,” Biostatistics, vol. 5, no. 3, pp. 427-443, 2004.
[9] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York: Springer, 2001.
[10] A.E. Hoerl and R.W. Kennard, “Ridge Regression: Biased Estimation for Nonorthogonal Problems,” Technometrics, vol. 12, no. 1, pp. 55-67, 1970.
[11] S. le Cessie and J.C. van Houwelingen, “Ridge Estimators in Logistic Regression,” Applied Statistics, vol. 41, no. 1, pp. 191-201, 1992.
[12] J.A. Wegelin, “A Survey of Partial Least Squares (PLS) Methods, with Emphasis on the Two-Block Case,” technical report, Dept. of Statistics, Univ. of Washington, 2000.
[13] G.H. Golub and C.F. Van Loan, Matrix Computations. The Johns Hopkins Univ. Press, 1996.
[14] I. Guyon et al., “Gene Selection for Cancer Classification Using Support Vector Machines,” Machine Learning, vol. 46, nos. 1-3, pp. 389-422, 2002.
[15] B. Efron, “The Efficiency of Logistic Regression Compared to Normal Discriminant Analysis,” J. Am. Statistical Assoc., vol. 70, no. 352, pp. 892-898, 1975.
[16] S.J. Press and S. Wilson, “Choosing between Logistic Regression and Discriminant Analysis,” J. Am. Statistical Assoc., vol. 73, no. 364, pp. 699-705, 1978.
[17] J. Li and H. Liu, “Kent Ridge Biomedical Data Set Repository,” http://sdmc-lit.org.sgGEDatasets, 2002.
[18] A. Schwaighofer, “SVM MATLAB Toolbox,” http://www.cis. tugraz.at/igi/aschwaigsvm_v251.tar.gz , 2001.
[19] S. Gunn, “SVM MATLAB Toolbox,” http://www.isis.ecs.soto n.ac.uk/resources svminfo/, 2001.
[20] T.R. Golub et al., “Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring,” Science, vol. 286, pp. 531-537, Oct. 1999.
[21] D. Singh et al., “Gene Expression Correlates of Clinical Prostate Cancer Behavior,” Cancer Cell, vol. 1, pp. 203-209, Mar. 2002.
[22] A.C. Tan and D. Gilbert, “Ensemble Machine Learning on Gene Expression Data for Cancer Classification,” Applied Bioinformatics, vol. 2, no. 3, pp. 75-83, 2003.
[23] M. Dettling and P. Bühlmann, “Finding Predictive Gene Groups from Microarray Data,” J. Multivariate Analysis, vol. 90, pp. 106-131, 2004.
[24] D.V. Nguyen and D.M. Rocke, “Tumor Classification by Partial Least Squares Using Microarray Gene Expression Data,” Bioinformatics, vol. 18, no. 1, pp. 39-50, 2002.
[25] R. Rosipal and L.J. Trejo, “Kernel Partial Least Squares Regression in Reproducing Kernel Hilbert Space,” J. Machine Learning Research, vol. 2, pp. 97-123, 2001.
[26] B. Scholköpf, A. Smola, and K.R. Müller, “Nonlinear Component Analysis as a Kernel Eigenvalue Problem,” Neural Computation, vol. 10, pp. 1299-1319, 1998.

Index Terms:
Dimension reduction, penalized logistic regression, singular value decomposition, partial least squares, cancer classification, classifier design and evaluation, feature evaluation and selection, microarray data.
Citation:
Li Shen, Eng Chong Tan, "Dimension Reduction-Based Penalized Logistic Regression for Cancer Classification Using Microarray Data," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 2, no. 2, pp. 166-175, April-June 2005, doi:10.1109/TCBB.2005.22
Usage of this product signifies your acceptance of the Terms of Use.