This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Metasample-Based Sparse Representation for Tumor Classification
September/October 2011 (vol. 8 no. 5)
pp. 1273-1282
Chun-Hou Zheng, Qufu Normal University, Rizhao and The Hong Kong Polytechnic University, Hong Kong
Lei Zhang, The Hong Kong Polytechnic University, Hong Kong
To-Yee Ng, The Hong Kong Polytechnic University, Hong Kong
Simon C.K. Shiu, The Hong Kong Polytechnic University, Hong Kong
De-Shuang Huang, Tongi University, Shanghai
A reliable and accurate identification of the type of tumors is crucial to the proper treatment of cancers. In recent years, it has been shown that sparse representation (SR) by l_1-norm minimization is robust to noise, outliers and even incomplete measurements, and SR has been successfully used for classification. This paper presents a new SR-based method for tumor classification using gene expression data. A set of metasamples are extracted from the training samples, and then an input testing sample is represented as the linear combination of these metasamples by l_1-regularized least square method. Classification is achieved by using a discriminating function defined on the representation coefficients. Since l_1-norm minimization leads to a sparse solution, the proposed method is called metasample-based SR classification (MSRC). Extensive experiments on publicly available gene expression data sets show that MSRC is efficient for tumor classification, achieving higher accuracy than many existing representative schemes.

[1] P.O. Brown and D. Botstein, “Exploring the New World of the Genome with DNA Microarray,” The Chipping Forest, vol. 21, pp. 33-37, 1999.
[2] E.E. Ntzani and J.P. Ioannidis, “Predictive Ability of DNA Microarrays for Cancer Outcomes and Correlates: An Empirical Assessment,” Lancet, vol. 362, pp. 1439-1444, 2003.
[3] T.R. Golub et al., “Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring,” Science, vol. 286, pp. 531-537, 1999.
[4] U. Alon, N. Barkai, D.A. Notterman, K. Gish, S. Ybarra, D. Mack, and A.J. Levine, “Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays,” Proc. Nat'l Academy of Sciences USA, vol. 96, pp. 6745-6750, 1999.
[5] J.P. Brunet, P. Tamayo, T.R. Golun, and J.P. Mesirov, “Metagenes and Molecular Pattern Discovery Using Matrix Factorization,” Proc Nat'l Academy of Sciences of USA, vol. 101, no. 12, pp. 4164-416, 2004.
[6] T.K. Paul and H. Iba, “Prediction of Cancer Class with Majority Voting Genetic Programming Classifier Using Gene Expression Data,” IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 6, no. 2, pp. 353-367, Apr.-June 2009.
[7] K. Bryan, P. Cunningham, and N. Bolshakova, “Application of Simulated Annealing to the Biclustering of Gene Expression Data,” IEEE Trans. Information Technology in Biomedicine, vol. 10, no. 3, pp. 519-525, July 2006.
[8] Y. Gao and C. George, “Improving Molecular Cancer Class Discovery through Sparse Non-Negative Matrix Factorization,” Bioinformatics, vol. 21, pp. 3970-3975, 2005.
[9] D.S. Huang and C.H. Zheng, “Independent Component Analysis-Based Penalized Discriminant Method for Tumor Classification Using Gene Expression Data,” Bioinformatics, vol. 22, pp. 1855-1862, 2006.
[10] Y. Tang, Y. Zhang, and Z. Huang, “Development of Two-Stage SVM-RFE Gene Selection Strategy for Microarray Expression Data Analysis,” IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 4, no. 3, pp. 365-381, July-Sept. 2007.
[11] J. Reunanen, “Overfitting in Making Comparisons between Variable Selection Methods,” J. Machine Learning Research, vol. 3, pp. 1371-1382, 2003.
[12] S.S. Chen, D.L. Donoho, and M.A. Saunders, “Atomic Decomposition by Basis Pursuit,” SIAM Review, vol. 43, no. 1, pp. 129-159, 2001.
[13] E.J. Candès, J. Romberg, and T. Tao, “Robust Uncertainty Principles: Exact Signal Reconstruction from Highly Incomplete Frequency Information,” IEEE Trans. Information Theory, vol. 52, no. 2, pp. 489-509, Feb. 2006.
[14] E.J. Candès and T. Tao, “Near-Optimal Signal Recovery from Random Projections: Universal Encoding Strategies?” IEEE Trans. Information Theory, vol. 52, no. 12, pp. 5406-5425, Dec. 2006.
[15] D.L. Donoho, “Compressed Sensing,” IEEE Trans. Information Theory, vol. 52, no. 4, pp. 1289-1306, Apr. 2006.
[16] R. Tibshirani, “Regression Shrinkage and Selection via the Lasso,” J. Royal Statistical Soc. Series B, vol. 58, no. 1, pp. 267-288, 1996.
[17] S.-J. Kim, K. Koh, M. Lustig, S. Boyd, and D. Gorinevsky, “An Interior-Point Method for Large-Scale l1-Regularized Least Squares,” IEEE J. Selected Topics in Signal Processing, vol. 1, no. 4, pp. 606-617, Dec. 2007.
[18] J. Wright, A.Y. Yang, A. Ganesh, S.S. Sastry, and Y. Ma, “Robust Face Recognition via Sparse Representation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 210-227, Feb. 2009.
[19] X. Hang and F.X. Wu, “Sparse Representation for Classification of Tumors Using Gene Expression Data,” J. Biomedicine and Biotechnology, vol. 2009, p. 6., 2009.
[20] P. Belhumeur, J. Hespanda, and D. Kriegman, “Eigenfaces versus Fisherfaces: Recognition Using Class Specific Linear Projection,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 711-720, July 1997.
[21] C.H. Zheng, D.S. Huang, L. Zhang, and X.Z. Kong, “Tumor Clustering Using Non-Negative Matrix Factorization with Gene Selection,” IEEE Trans. Information Technology in Biomedicine, vol. 13, no. 4, pp. 599-607, July 2009.
[22] O. Alter, P.O. Brown, and D. Botstein, “Singular Value Decomposition for Genome-Wide Expression Data Processing and Modeling,” Proc. Nat'l Academy of Sciences USA, vol. 97, pp. 10101-10106, 2000.
[23] W. Liebermeister, “Linear Modes of Gene Expression Determined by Independent Component Analysis,” Bioinformatics, vol. 18, pp. 51-60, 2002.
[24] C.H. Zheng, D.S. Huang, K. Li, G. Irwin, and Z.L. Sun, “MISEP Method for Post-Nonlinear Blind Source Separation,” Neural Computation, vol. 19, no. 9, pp. 2557-2578, 2007.
[25] H.Q. Wang and D.S. Huang, “Regulation Probability Method for Gene Selection,” Pattern Recognition Letter, vol. 27, no. 2, pp. 116-122, 2006.
[26] H.Q. Wang, H.S. Wong, D.S. Huang, and J. Shu, “Extracting Gene Regulation Information for Cancer Classification,” Pattern Recognition, vol. 40, pp. 3379-3392, 2007.
[27] E. Amaldi and V. Kann, “On the Approximability of Minimizing Nonzero Variables or Unsatisfied Relations in Linear Systems,” Theoretical Computer Science, vol. 209, pp. 237-260, 1998.
[28] T.S. Furey, N. Cristianini, N. Duffy, D.W. Bednarski, M. Schummer, and D. Haussler, “Support Vector Machines Classification and Validation of Cancer Tissue Samples Using Microarray Expression Data,” Bioinformatics, vol. 16, pp. 906-914, 2000.
[29] A. Statnikov, C.F. Aliferis, I. Tsamardinos, D. Hardin, and S. Levy, “A Comprehensive Evaluation of Multicategory Classification Methods for Microarray Gene Expression Cancer Diagnosis,” Bioinformatics, vol. 21, no. 5, pp. 631-643, 2005.
[30] N. Pochet, F. De Smet, J.A.K. Suykens, and B.L.R. De Moor, “Systematic Benchmarking of Microarray Data Classification: Assessing the Role of Non-Linearity and Dimensionality Reduction,” Bioinformatics, vol. 20, pp. 3185-3195, 2004.
[31] J.E. Staunton et al., “Chemosensitivity Prediction by Transcriptional Profiling,” Proc. Nat'l Academy of Sciences USA, vol. 98, no. 19, pp. 10787-10792, 2001.
[32] A.I. Su et al., “Molecular Classification of Human Carcinomas by Use of Gene Expression Signatures,” Cancer Research, vol. 61, no. 20, pp. 7388-7393, 2001.
[33] S.A. Armstrong, J.E. Staunton, L.B. Silverman, R. Pieters, M.L. den Boer, M.D. Minden, S.E. Sallan, E.S. Lander, T.R. Golub, and S.J. Korsmeyer, “MLL Translocations Specify a Distinct Gene Expression Profile that Distinguishes a Unique Leukemia,” Nature Genetics, vol. 30, pp. 41-47, 2002.
[34] A. Bhattacharjee et al., “Classification of Human Lung Carcinomas by mRNA Expression Profiling Reveals Distinct Adenocarcinoma Subclasses,” Proc. Nat'l Academy of Sciences USA, vol. 98, pp. 13790-13795, 2001.
[35] J. Khan, J.S. Wei, M. Ringner, L.H. Saal, M. Ladanyi, F. Westermann, F. Berthold, M. Schwab, C.R. Antonescu, C. Peterson, and P.S. Meltzer, “Classification and Diagnostic Prediction of Cancers Using Gene Expression Profiling and Artificial Neural Networks,” Nature Medicine, vol. 7, no. 6, pp. 673-679, 2001.
[36] M.A. Shipp et al., “Diffuse Large B-Cell Lymphoma Outcome Prediction by Gene Expression Profiling and Supervised Machine Learning,” Nature Medicine, vol. 8, no. 1, pp. 68-74, 2002.
[37] D. Singh et al., “Gene Expression Correlates of Clinical Prostate Cancer Behavior,” Cancer Cell, vol. 1, pp. 203-209, 2002.
[38] S. Dudoit, J. Fridlyand, and T.P. Speed, “Comparison of Discrimination Methods for the Classification of Tumor Using Gene Expression Data,” J. Am. Statistical Assoc., vol. 97, pp. 77-87, 2002.
[39] S.L. Wang, X. Li, S. Zhang, J. Gui, and D.S. Huang, “Tumor Classification by Combining PNN Classifier Ensemble with Neighborhood Rough Set Based Gene Reduction,” Computers in Biology and Medicine, vol. 40, no. 2, pp. 179-189, 2010.
[40] D. Ghosh and A.M. Chinnaiyan, “Classification and Selection of Biomarkers in Genomic Data Using LASSO,” J. Biomedicine and Biotechnology, vol. 2, pp. 147-154, 2005.

Index Terms:
Tumors classification, sparse representation, metasample, gene expression data.
Citation:
Chun-Hou Zheng, Lei Zhang, To-Yee Ng, Simon C.K. Shiu, De-Shuang Huang, "Metasample-Based Sparse Representation for Tumor Classification," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 8, no. 5, pp. 1273-1282, Sept.-Oct. 2011, doi:10.1109/TCBB.2011.20
Usage of this product signifies your acceptance of the Terms of Use.