
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
Jieping Ye, Tao Li, Tao Xiong, Ravi Janardan, "Using Uncorrelated Discriminant Analysis for Tissue Classification with Gene Expression Data," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 1, no. 4, pp. 181190, OctoberDecember, 2004.  
BibTex  x  
@article{ 10.1109/TCBB.2004.45, author = {Jieping Ye and Tao Li and Tao Xiong and Ravi Janardan}, title = {Using Uncorrelated Discriminant Analysis for Tissue Classification with Gene Expression Data}, journal ={IEEE/ACM Transactions on Computational Biology and Bioinformatics}, volume = {1}, number = {4}, issn = {15455963}, year = {2004}, pages = {181190}, doi = {http://doi.ieeecomputersociety.org/10.1109/TCBB.2004.45}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE/ACM Transactions on Computational Biology and Bioinformatics TI  Using Uncorrelated Discriminant Analysis for Tissue Classification with Gene Expression Data IS  4 SN  15455963 SP181 EP190 EPD  181190 A1  Jieping Ye, A1  Tao Li, A1  Tao Xiong, A1  Ravi Janardan, PY  2004 KW  Microarray data analysis KW  discriminant analysis KW  generalized singular value decomposition KW  classification. VL  1 JA  IEEE/ACM Transactions on Computational Biology and Bioinformatics ER   
Abstract—The classification of tissue samples based on gene expression data is an important problem in medical diagnosis of diseases such as cancer. In gene expression data, the number of genes is usually very high (in the thousands) compared to the number of data samples (in the tens or low hundreds); that is, the data dimension is large compared to the number of data points (such data is said to be undersampled). To cope with performance and accuracy problems associated with high dimensionality, it is commonplace to apply a preprocessing step that transforms the data to a space of significantly lower dimension with limited loss of the information present in the original data. Linear Discriminant Analysis (LDA) is a wellknown technique for dimension reduction and feature extraction, but it is not applicable for undersampled data due to singularity problems associated with the matrices in the underlying representation. This paper presents a dimension reduction and feature extraction scheme, called Uncorrelated Linear Discriminant Analysis (ULDA), for undersampled problems and illustrates its utility on gene expression data. ULDA employs the Generalized Singular Value Decomposition method to handle undersampled data and the features that it produces in the transformed space are uncorrelated, which makes it attractive for gene expression data. The properties of ULDA are established rigorously and extensive experimental results on gene expression data are presented to illustrate its effectiveness in classifying tissue samples. These results provide a comparative study of various stateoftheart classification methods on wellknown gene expression data sets.
[1] A.A. Alizadeh, M.B. Eisen, R.E. David, C. Ma, I.S. Lossos, A. Rosenwald, H.C. Boldrick, H. Sabet, T. Tran, X. Yu, J.I. Powell, L. Yang, G.E. Martu, T. Moore, J. Hudson, L. Lu, D.B. Lewis, R. Tibshirani, G. Sherlock, W.C. Chan, T.C. Greiner, D.D. Weisenburger, G.P. Armitage, R. Warnke, R. Levy, W. Wilson, M.R. Grever, J.C. Byrd, D. Botsten, P.O. Brown, and L.M. Staudt, “Distinct Types Of Diffuse Large BCell Lymphoma Identified by Gene Expression Profiling,” Nature, vol. 403, pp. 503511, 2000.
[2] U. Alon, N. Barkai, D.A. Notterman, K. Gish, S. Ybarra, D. Mack, and A.J. Levine, “Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays,” Proc. Nat'l Academy of Science, vol. 96, pp. 67456750, 1999.
[3] A. BenDor, L. Bruhn, N. Friedman, I. Nachman, M. Schummer, and Z. Yakhini, “Tissue Classification with Gene Expression Profiles,” J. Computational Biology, vol. 7, pp. 559584, 2000.
[4] P.N. Belhumeour, J.P. Hespanha, and D.J. Kriegman, “Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 711720, July 1997.
[5] M.P.S. Brown, W.N. Grundy, D. Lin, N. Cristianini, C.W. Sugnet, and T.S. Furey, “KnowledgeBased Analysis of Microarray Gene Expression Data by Using Support Vector Machines,” Proc. Nat'l Academy of Science, vol. 97, pp. 262267, 2000.
[6] C.J.C. Burges, “A Tutorial on Support Vector Machines for Pattern Recognition,” Data Mining and Knowledge Discovery, vol. 2, no. 2, pp. 121167, 1998.
[7] M. Chee, R. Yang, E. Hubbell, A. Berno, X. Huang, D. Stern, J. Winkler, D. Lockhart, M. Morris, and S. Fodor, “Accessing Genetic Information with High Density DNA Arrays,” Science, vol. 274, pp. 610614, 1996.
[8] S. Dudoit, J. Fridlyand, and T.P. Speed, “Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data,” J. Am. Statistical Assoc., vol. 97, pp. 7787, 2002.
[9] M.B. Eisen, P.T. Spellman, P.O. Brown, and D. Botstein, “Cluster Analysis and Display of GenomeWide Expression Patterns,” Proc. Nat'l Academy of Science, vol. 95, pp. 148634868, 1998.
[10] S. Fodor, J. Read, M. Pirrung, L. Stryer, A. Lu, and D. Solas, “LightDirected, Spatially Addressable Parallel Chemical Synthesis,” Science, vol. 251, pp. 767783, 1991.
[11] N. Friedman, M. Linial, I. Nachman, and D. Pe'er, “Using Bayesian Networks to Analyze Expression Data,” J. Computational Biology, vol. 7, pp. 601620, 2000.
[12] K. Fukunaga, Introduction to Statistical Pattern Recognition. Academic Press, 1990.
[13] G. Getz, E. Levine, and E. Domany, “Coupled TwoWay Clustering Analysis of Gene Microarray Data,” Proc. Nat'l Academy of Science, vol. 97, pp. 1207912084, 2000.
[14] G.H. Golub and C.F. V. Loan, Matrix Computations. The Johns Hopkins Univ. Press, 1991.
[15] T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gassenbeek, J.P. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloomfield, and E.S. Lander, “Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring,” Science, vol. 286, pp. 531537, 1999.
[16] T. Hastie, R. Tibshirani, and J.H. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2001.
[17] P. Howland, M. Jeon, and H. Park, “Structure Preserving Dimension Reduction for Clustered Text Data Based on the Generalized Singular Value Decomposition,” SIAM J. Matrix Analysis and Applications, vol. 25, no. 1, pp. 165179, 2003.
[18] C.W. Hsu and C.J. Lin, “A Comparison of Methods for MultiClass Support Vector Machines,” IEEE Trans. Neural Networks, vol. 13, pp. 415425, 2002.
[19] J. Khan, J. Wei, M. Ringner, L. Saal, M. Ladanyi, F. Westermann, F. Berthold, M. Schwab, C.R. Antonescu, C. Peterson, and P. Meltzer, “Classification and Diagnostic Prediction of Cancers Using Expression Profiling and Artificial Neural Networks,” Nature Medicine, vol. 7, pp. 673679, 2001.
[20] W.J. Krzanowski, P. Jonathan, W.V. McCarthy, and M.R. Thomas, “Discriminant Analysis with Singular Covariance Matrices: Methods and Applications to Spectroscopic Data,” Applied Statistics, vol. 44, pp. 101115, 1995.
[21] Y. Lee and C.K. Lee, “Classification of Multiple Cancer Types by Multicategory Support Vector Machines Using Gene Expression Data,” Bioinformatics, vol. 19, no. 9, pp. 11321139, 2003.
[22] T. Li, C. Zhang, and M. Ogihara, “A Comparative Study of Feature Selection and Multiclass Classification Methods for Tissue Classification Based on Gene Expression,” Bioinformatics, vol. 20, no. 15, pp. 24292437, 2004.
[23] C. Ooi and P. Tan, “Genetic Algorithms Applied to MultiClass Prediction for the Analysis of Gene Expression Data,” Bioinformatics, vol. 19, pp. 3744, 2003.
[24] S. Ramaswamy, P. Tamayo, R. Rifkin, S. Mukherjee, C.H. Yeang, M. Angelo, C. Ladd, M. Reich, E. Latulippe, J.P. Mesirov, T. Poggio, W. Gerald, M. Loda, E.S. Lander, and R.T. Golub, “Multiclass Cancer Diagnosis Using Tumor Gene Expression Signatures,” Proc. Nat'l Academy of Science, vol. 98, pp. 1514915154, 2001.
[25] D.T. Ross, U. Scherf, M.B. Eisen, C.M. Perou, C. Rees, P. Spellmand, V. Iyer, S.S. Jeffrey, M. Van de Rijn, M. Waltham, A. Pergamenschikov, J.C.F. Lee, D. Lashkari, D. Shalon, T.G. Myers, J.N. Weinstein, D. Botstein, and M.P.O. Brown, “Systematic Variation in Gene Expression Patterns in Human Cancer Cell Lines,” Nature Genetics, vol. 24, pp. 227235, 2000.
[26] D. Singh et al., “Gene Expression Correlates of Clinical Prostate Cancer Behavior,” Cancer Cell, vol. 1, no. 2, pp. 203209, 2002.
[27] A. Statnikov, C.F. Aliferis, I. Tsamardinos, D. Hardin, and S. Levy, “A Comprehensive Evaluation of Multicategory Classification Methods for Microarray Gene Expression Cancer Diagnosis,” Bioinformatics, 2004.
[28] R. Tibshirani, T. Hastie, B. Narasimhan, and G. Chu, “Diagnosis of Multiple Cancer Types by Shrunken Centroids of Gene Expression,” Proc. Nat'l Academy of Science, vol. 99, no. 10, pp. 65676572, 2002.
[29] V.N. Vapnik, Statistical Learning Theory. New York: Wiley, 1998.
[30] J. Ye, “Characterization of a Family of Algorithms for Generalized Discriminant Analysis on Undersampled Problems,” pending publication.
[31] J. Ye, R. Janardan, Q. Li, and H. Park, “Feature Extraction via Generalized Uncorrelated Linear Discriminant Analysis,” Proc. 21st Int'l Conf. Machine Learning, pp. 895902, 2004.
[32] J. Ye, R. Janardan, C.H. Park, and H. Park, “An Optimization Criterion for Generalized Discriminant Analysis on Undersampled Problems,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 8, pp. 982994, Aug. 2004.
[33] C.H. Yeang, S. Ramaswamy, P. Tamayo, S. Mukherjee, R. Rifkin, M. Angelo, M. Reich, E.S. Lander, J.P. Mesirov, and T.R. Golub, “Molecular Classification of Multiple Tumor Types,” Bioinformatics, vol. 11, pp. 17, 2001.
[34] E.J. Yeoh, M.E. Ross, S.A. Shurtleff, W.K. Williams, D. Patel, R. Mahrouz, F.G. Behm, S.C. Raimondi, M.V. Relling, A. Patel, C. Cheng, D. Campana, D. Wilkins, X. Zhou, J. Li, H. Liu, C.H. Pui, W.E. Evans, C. Naeve, L. Wong, and J.R. Downing, “Classification, Subtype Discovery, and Prediction of Outcome in Pediatric Lymphoblastic Leukemia by Gene Expression Profiling,” Cancer Cell, vol. 1, pp. 133143, 2002.
[35] K.Y. Yeung and W.L. Ruzzo, “Principal Component Analysis for Clustering Gene Expression Data,” Bioinformatics, vol. 17, pp. 763774, 2001.