The Community for Technology Leaders
RSS Icon
Issue No.03 - July-September (2008 vol.5)
pp: 368-384
George Lee , Rutgers University, Piscataway
Carlos Rodriguez , University of Puerto Rico, Mayagez
Anant Madabhushi , Rutgers University, Piscataway
The recent explosion in procurement and availability of high-dimensional gene- and protein-expression profile datasets for cancer diagnostics has necessitated the development of sophisticated machine learning tools with which to analyze them. A major limitation in the ability to accurate classify these high-dimensional datasets stems from the 'curse of dimensionality', occurring in situations where the number of genes or peptides significantly exceeds the total number of patient samples. Previous attempts at dealing with this issue have mostly centered on the use of a dimensionality reduction (DR) scheme, Principal Component Analysis (PCA), to obtain a low-dimensional projection of the high-dimensional data. However, linear PCA and other linear DR methods, which rely on Euclidean distances to estimate object similarity, do not account for the inherent underlying nonlinear structure associated with most biomedical data. The motivation behind this work is to identify the appropriate DR methods for analysis of high-dimensional gene- and protein-expression studies. Towards this end, we empirically and rigorously compare three nonlinear (Isomap, Locally Linear Embedding, Laplacian Eigenmaps) and three linear DR schemes (PCA, Linear Discriminant Analysis, Multidimensional Scaling) with the intent of determining a reduced subspace representation in which the individual object classes are more easily discriminable.
Bioinformatics (genome or protein) databases, Clustering, classification, and association rules, Data and knowledge visualization, Data mining, Feature extraction or construction
George Lee, Carlos Rodriguez, Anant Madabhushi, "Investigating the Efficacy of Nonlinear Dimensionality Reduction Schemes in Classifying Gene and Protein Expression Studies", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.5, no. 3, pp. 368-384, July-September 2008, doi:10.1109/TCBB.2008.36
[1] T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloomþeld, and E.S. Lander, “Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring,” Science, vol. 286, no. 531, pp. 531-537, 1999.
[2] Y. Peng, “A Novel Ensemble Machine Learning for Robust Microarray Data Classification,” Computers in Biology and Medicine, vol. 36, no. 6, pp. 553-573, 2006.
[3] C. Shi and L. Chen, “Feature Dimension Reduction for Microarray Data Analysis Using Locally Linear Embedding,” Proc. Third Asia Pacific Bioinformatics Conf. (APBC '05), pp. 211-217, 2005.
[4] S.D. Der, A. Zhou, B.R. Williams, and R.H. Silverman, “Identification of Genes Differentially Regulated by Interferon Alpha, Beta, or Gamma Using Oligonucleotide Arrays,” Proc. Nat'l Academy of Sciences of the United States of Am., vol. 95, no. 26, pp. 15623-15628, Dec. 1998.
[5] R. Maglietta, A. D'Addabbo, A. Piepoli, F. Perri, S. Liuni, G. Pesole, and N. Ancona, “Selection of Relevant Genes in Cancer Diagnosis Based on Their Prediction Accuracy,” Artificial Intelligence in Medicine, vol. 40, no. 1, pp. 29-44, May 2007.
[6] T.M. Huang and V. Kecman, “Gene Extraction for Cancer Diagnosis by Support Vector Machines—An Improvement,” Artificial Intelligence in Medicine, vol. 35, nos. 1-2, pp. 185-194, 2005.
[7] G. Turashvili, J. Bouchal, K. Baumforth, W. Wei, M. Dziechciarkova, J. Ehrmann, J. Klein, E. Fridman, J. Skarda, J. Srovnal, M. Hajduch, P. Murray, and Z. Kolar, “Novel Markers for Differentiation of Lobular and Ductal Invasive Breast Carcinomas by Laser Microdissection and Microarray Analysis,” BMC Cancer, vol. 7, no. 55, 2007.
[8] A.A. Alizadeh, M.B. Eisen, R.E. Davis, C. Ma, I.S. Lossos, A. Rosenwald, J.C. Boldrick, H. Sabet, T. Tran, X. Yu, J.I. Powell, L. Yang, G.E. Marti, T. Moore, J. Hudson, L. Lu, D.B. Lewis, R. Tibshirani, G. Sherlock, W.C. Chan, T.C. Greiner, D.D. Weisenburger, J.O. Armitage, R. Warnke, R. Levy, W. Wilson, M.R. Grever, J.C. Byrd, D. Botstein, P.O. Brown, and L.M. Staudt, “Distinct Types of Diffuse Large B-Cell Lymphoma Identified by Gene Expression Profiling,” Nature, vol. 403, no. 6769, pp. 503-511, 2000.
[9] A. Ben-Dor, L. Bruhn, N. Friedman, I. Nachman, M. Schummer, and Z. Yakhini, “Tissue Classification with Gene Expression Profiles,” J. Computational Biology, vol. 7, nos. 3-4, pp. 559-583, 2000.
[10] M.P. Brown, W.N. Grundy, D. Lin, N. Cristianini, C.W. Sugnet, T.S. Furey, M. Ares, and D. Haussler, “Knowledge-Based Analysis of Microarray Gene Expression Data by Using Support Vector Machines,” Proc. Nat'l Academy of Sciences USA, vol. 97, no. 1, pp.262-267, Jan. 2000.
[11] A.C. Tan and D. Gilbert, “Ensemble Machine Learning on Gene Expression Data for Cancer Classification,” Applied Bioinformatics, vol. 2, no. 3 supplement, pp. S75-S83, 2003.
[12] L. Song, J. Bedo, K.M. Borgwardt, A. Gretton, and A. Smola, “Gene Selection via the Bahsic Family of Algorithms,” Bioinformatics, vol. 23, pp. 490-498, 2007.
[13] L. Li, W. Jiang, X. Li, K.L. Moser, Z. Guo, L. Du, Q. Wang, E.J. Topol, Q. Wang, and S. Rao, “A Robust Hybrid between Genetic Algorithm and Support Vector Machine for Extracting an Optimal Feature Gene Subset,” Genomics, vol. 85, pp. 16-23, 1995.
[14] T. Li, C. Zhang, and M. Ogihara, “A Comparative Study of Feature Selection and Multiclass Classification Methods for Tissue Classification Based on Gene Expression,” Bioinformatics, vol. 20, no. 15, pp. 2429-2437, Oct. 2004.
[15] H. Liu, J. Li, and L. Wong, “A Comparative Study on Feature Selection and Classification Methods Using Gene Expression Profiles and Proteomic Patterns,” Genome Informatics, vol. 13, pp.51-60, 2002.
[16] U. Alon, N. Barkai, D.A. Notterman, K. Gish, S. Ybarra, D. Mack, and A.J. Levine, “Broad Patterns of Gene Expression Revealed by Clustering of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays,” Proc. Nat'l Academy of Sciences, vol. 96, no. 12, pp. 6745-6750, 1999.
[17] D. Singh, P.G. Febbo, K. Ross, D.G. Jackson, J. Manola, C. Ladd, P. Tamayo, A.A. Renshaw, A.V. D'Amico, J.P. Richie, E.S. Lander, M. Loda, P.W. Kantoff, T.R. Golub, and W.R. Sellers, “Gene Expression Correlates of Clinical Prostate Cancer Behavior,” Cancer Cell, vol. 1, no. 2, pp. 203-209, 2002.
[18] M. Park, J.W. Lee, J.B. Lee, and S.H. Song, “Several Biplot Methods Applied to Gene Expression Data,” J. Statistical Planning and Inference, vol. 138, pp. 500-515, 2007.
[19] E.F. Petricoin, A.M. Ardekani, B.A. Hitt, P.J. Levine, V.A. Fusaro, S.M. Steinberg, G.B. Mills, C. Simone, D.A. Fishman, E.C. Kohn, and L.A. Liotta, “Use of Proteomic Patterns in Serum to Identify Ovarian Cancer,” The Lancet, vol. 359, no. 9306, pp. 572-577, 2002.
[20] P. Tamayo, D. Slonim, J. Mesirov, Q. Zhu, S. Kitareewan, E. Dmitrovsky, E.S. Lander, and T.R. Golub, “Interpreting Patterns of Gene Expression with Self-Organizing Maps: Methods and Application to Hematopoietic Differentiation,” Proc. Nat'l Academy of Sciences USA, vol. 96, no. 6, pp. 2907-2912, Mar. 1999.
[21] S. Yang, J. Shin, K.H. Park, H.-C. Jeung, S.Y. Rha, S.H. Noh, W.I. Yang, and H.C. Chung, “Molecular Basis of the Differences between Normal and Tumor Tissues of Gastric Cancer,” Biochimica et Biophysica Acta, 2007.
[22] L.J. van 't Veer, H. Dai, M.J. van de Vijver, Y.D. He, A.A.M. Hart, M. Mao, H.L. Peterse, K. van der Kooy, M.J. Marton, A.T. Witteveen, G.J. Schreiber, R.M. Kerkhoven, C. Roberts, R.B.P.S. Linsley, and S.H. Friend, “Gene Expression Profiling Predicts Clinical Outcome of Breast Cancer,” Nature, vol. 415, pp. 430-536, 2002.
[23] S.L. Pomeroy, P. Tamayo, M. Gaasenbeek, L.M. Sturla, M. Angelo, M.E. McLaughlin, J.Y.H. Kim, L.C. Goumnerova, P.M. Black, C. Lau, J.C. Allen, D. Zagzag, J.M. Olson, T. Curran, C. Wetmore, J.A. Biegel, T. Poggio, S. Mukherjee, R. Rifkin, A. Califano, G. Stolovitzky, D.N. Louis, J.P. Mesirov, E.S. Lander, and T.R. Golub, “Prediction of Central Nervous System Embryonal Tumour Outcome Based on Gene Expression,” Nature, vol. 415, pp. 436-442, 2002.
[24] W.A. Freije, F.E. Castro-Vargas, Z. Fang, S. Horvath, T. Cloughesy, L.M. Liau, P.S. Mischel, and S.F. Nelson, “Gene Expression Profiling of Gliomas Strongly Predicts Survival,” Cancer Research, vol. 64, no. 18, pp. 6503-6510, 2004.
[25] M.A. Shipp, K.N. Ross, P. Tamayo, A.P. Weng, J.L. Kutok, R.C.T. Aguiar, M. Gaasenbeek, M. Angelo, M. Reich, G.S. Pinkus, T.S. Ray, M.A. Kovall, K.W. Last, A. Norton, T.A. Lister, J. Mesirov, D.S. Neuberg, E.S. Lander, J.C. Aster, and T.R. Golub, “Diffuse Large B-Cell Lymphoma Outcome Prediction by Gene-Expression Profiling and Supervised Machine Learning,” Nature Medicine, vol. 8, pp. 68-74, 2002.
[26] D.G. Beer, S.L.R. Kardia, C.-C. Huang, T.J. Giordano, A.M. Levin, D.E. Misek, L. Lin, G. Chen, T.G. Gharib, D.G. Thomas, M.L. Lizyness, R. Kuick, S. Hayasaka, J.M.G. Taylor, M.D. Iannettoni, M.B. Orringer, and S. Hanash, “Gene-Expression Profiles Predict Survival of Patients with Lung Adenocarcinoma,” Nature Medicine, vol. 8, pp. 816-823, 2002.
[27] D.A. Wigle, I. Jurisica, N. Radulovich, M. Pintilie, J. Rossant, N. Liu, C. Lu, J. Woodgett, I. Seiden, M. Johnston, S. Keshavjee, G. Darling, T. Winton, B.-J. Breitkreutz, P. Jorgenson, M. Tyers, F.A. Shepherd, and M.S. Tsao, “Molecular Profiling of Non-Small Cell Lung Cancer and Correlation with Disease-Free Survival,” Cancer Research, vol. 62, pp. 3005-3008, 2002.
[28] R.E. Bellman, Adaptive Control Processes. Princeton Univ. Press, 1961.
[29] J. Ye, T. Li, T. Xiong, and R. Janardan, “Using Uncorrelated Discriminant Analysis for Tissue Classification with Gene Expression Data,” IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 1, no. 4, pp. 181-190, Jan.-Mar. 2004.
[30] Z. Liu, D. Chen, and H. Bensmail, “Gene Expression Data Classification with Kernel Principal Component Analysis,” J.Biomedicine and Biotechnology, vol. 2, pp. 155-159, 2005.
[31] R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification, second ed. Wiley, 2000.
[32] E.-J. Yeoh, M.E. Ross, S.A. Shurtleff, W.K. Williams, D. Patel, R. Mahfouz, F.G. Behm, S.C. Raimondi, M.V. Relling, A. Patel, C. Cheng, D. Campana, D. Wilkins, X. Zhou, J. Li, H. Liu, C.-H. Pui, W.E. Evans, C. Naeve, L. Wong, and J.R. Downing, “Classification, Subtype Discovery, and Prediction of Outcome in Pediatric Acute Lymphoblastic Leukemia by Gene Expression Profiling,” Cancer Cell, vol. 1, no. 2, pp. 133-143, 2002.
[33] J.J. Dai, L. Lieu, and D. Rocke, “Dimension Reduction for Classification with Gene Expression Microarray Data,” Statistical Applications in Genetics and Molecular Biology, vol. 5, no. 1, pp. 1-15, 2006.
[34] K. Dawson, R.L. Rodriguez, and W. Malyj, “Sample Phenotype Clusters in High-Density Oligonucleotide Microarray Data Sets Are Revealed Using Isomap, a Nonlinear Algorithm,” BMC Bioinformatics, vol. 6, p. 195, 2005.
[35] C. Truntzer, C. Mercier, J. Estève, C. Gautier, and P. Roy, “Importance of Data Structure in Comparing Two Dimension Reduction Methods for Classification of Microarray Gene Expression Data,” BMC Bioinformatics, vol. 8, no. 90, 2007.
[36] A. Andersson, T. Olofsson, D. Lindgren, B. Nilsson, C. Ritz, P. Eden, C. Lassen, J. Rade, M. Fontes, H. Morse, J. Heldrup, M. Behrendtz, F.M.M. Hoglund, B. Johansson, and T. Fioretos, “Molecular Signatures in Childhood Acute Leukemia and Their Correlations to Expression Patterns in Normal Hematopoietic Subpopulations,” Proc. Nat'l Academy of Sciences, vol. 102, no. 52, pp. 19069-19074, 2005.
[37] Y. Zhu, R. Wu, N. Sangha, C. Yoo, K.R. Cho, K.A. Shedden, H. Katabuchi, and D.M. Lubman, “Classifications of Ovarian Cancer Tissues by Proteomic Patterns,” Proteomics, vol. 6, pp. 5846-5856, 2006.
[38] M.A. Mendez, C. Hodar, C. Vulpe, and M. Gonzalez, “Discriminant Analysis to Evaluate Clustering of Gene Expression Data,” Federation of European Biochemical Soc., vol. 522, pp. 24-28, 2002.
[39] H. Hotelling, “Analysis of a Complex of Statistical Variables into Principal Components,” J. Educational Psychology, vol. 24, pp. 417-441, 1933.
[40] J. Venna and S. Kaski, “Local Multidimensional Scaling,” Neural Networks, vol. 19, pp. 889-899, 2006.
[41] J. Shi and J. Malik, “Normalized Cuts and Image Segmentation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 888-905, Aug. 2000.
[42] J. Tenenbaum, V. de Silva, and J.C. Langford, “A Global Geometric Framework for Nonlinear Dimensionality Reduction,” Science, vol. 290, no. 5500, pp. 2319-2322, 2000.
[43] S. Roweis and L. Saul, “Nonlinear Dimensionality Reduction by Locally Linear Embedding,” Science, vol. 290, no. 5500, pp. 2323-2326, 2000.
[44] M. Belkin and P. Niyogi, “Laplacian Eigenmaps for Dimensionality Reduction and Data Representation,” Neural Computation, vol. 15, no. 6, pp. 1373-1396, 2003.
[45] A. Madabhushi, J. Shi, M. Rosen, J.E. Tomaszeweski, and M.D. Feldman, “Graph Embedding to Improve Supervised Classification and Novel Class Detection: Application to Prostate Cancer,” Proc. Eighth Int'l Conf. Medical Image Computing and Computer-Assisted Intervention (MICCAI '05), pp. 729-737, 2005.
[46] P. Tiwari, A. Madabhushi, and M. Rosen, “A Hierarchical Unsupervised Spectral Clustering Scheme for Detection of Prostate Cancer from Magnetic Resonance Spectroscopy (MRS),” Proc. 10th Int'l Conf. Medical Image Computing and Computer-Assisted Intervention (MICCAI '07), vol. 2, pp. 278-286, 2007.
[47] S. Doyle, M. Hwang, K. Shah, A. Madabhushi, M. Feldman, and J. Tomaszeweski, “Automated Grading of Prostate Cancer Using Architectural and Textural Image Features,” Proc. Fourth IEEE Int'l Symp. Biomedical Imaging (ISBI '07), pp. 1284-1287, 2007.
[48] S. Doyle, M. Hwang, S. Naik, M. Feldman, J. Tomaszeweski, and A. Madabhushi, “Using Manifold Learning for Content-Based Image Retrieval of Prostate Histopathology,” Proc. 10th Int'l Conf. Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2007.
[49] S. Weng, C. Zhang, Z. Lin, and X. Zhang, “Mining the Structural Knowledge of High-Dimensional Medical Data Using Isomap,” Medical and Biological Eng. and Computing, vol. 43, pp. 410-412, 2005.
[50] J. Nilsson, T. Fioretos, M. Höglund, and M. Fontes, “Approximate Geodesic Distances Reveal Biologically Relevant Structures in Microarray Data,” Bioinformatics, vol. 20, no. 6, pp. 874-880, 2004.
[51] T. Dietterich, “Ensemble Methods in Machine Learning,” Proc. First Int'l Workshop Multiple Classifier Systems (MCS), 2000.
[52] A. Madabhushi, J. Shi, M.D. Feldman, M. Rosen, and J. Tomaszewski, “Comparing Ensembles of Learners: Detecting Prostate Cancer from High Resolution MRI,” Proc. Second Int'l Workshop Computer Vision Approaches to Medical Image Analysis (CVAMIA '06), pp. 25-36, 2006.
[53] T.-S. Lim, W.-Y. Loh, and Y.-S. Shih, “A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms,” Machine Learning, vol. 40, pp.203-228, 2000.
[54] G.J. Gordon, R.V. Jensen, L.-L. Hsiao, S.R. Gullans, J.E. Blumenstock, S. Ramaswamy, W.G. Richards, D.J. Sugarbaker, and R. Bueno, “Translation of Microarray Data into Clinically Relevant Cancer Diagnostic Tests Using Gene Expression Ratios in Lung Cancer and Mesothelioma,” Cancer Research, vol. 62, pp. 4963-4967, 2002.
[55] C. Cortes and V. Vapnik, “Support-Vector Networks,” Machine Learning, vol. 20, 1995.
[56] J.R. Quinlan, “Bagging, Boosting, and C4.5,” Proc. 13th Nat'l Conf. Artificial Intelligence and Eighth Innovative Applications of Artificial Intelligence Conf. (AAAI/IAAI '96), vol. 1, pp. 725-730, 1996.
[57] J.R. Quinlan and R.L. Rivest, “Inferring Decision Trees Using the Minimum Description Length Principle,” Information Computation, vol. 80, no. 3, pp. 227-248, 1989.
[58] J. Handl, J. Knowles, and D.B. Kell, “Computational Cluster Validation in Post-Genomic Data Analysis,” Bioinformatics, vol. 21, no. 15, pp. 3201-3212, 2005.
[59] F. Kovacs, C. Legancy, and A. Babos, “Cluster Validity Measurement Techniques,” Proc. Sixth Int'l Symp. Hungarian Researchers on Computational Intelligence (CINTI), 2005.
19 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool