The Community for Technology Leaders
RSS Icon
Issue No.03 - May/June (2011 vol.8)
pp: 577-591
Alfredo Benso , Politecnico di Torino, Torino
Stefano Di Carlo , Politecnico di Torino, Torino
Gianfranco Politano , Politecnico di Torino, Torino
Despite great advances in discovering cancer molecular profiles, the proper application of microarray technology to routine clinical diagnostics is still a challenge. Current practices in the classification of microarrays' data show two main limitations: the reliability of the training data sets used to build the classifiers, and the classifiers' performances, especially when the sample to be classified does not belong to any of the available classes. In this case, state-of-the-art algorithms usually produce a high rate of false positives that, in real diagnostic applications, are unacceptable. To address this problem, this paper presents a new cDNA microarray data classification algorithm based on graph theory and is able to overcome most of the limitations of known classification methodologies. The classifier works by analyzing gene expression data organized in an innovative data structure based on graphs, where vertices correspond to genes and edges to gene expression relationships. To demonstrate the novelty of the proposed approach, the authors present an experimental performance comparison between the proposed classifier and several state-of-the-art classification algorithms.
Microarray, gene expression, classification, clinical diagnostics, graph theory.
Alfredo Benso, Stefano Di Carlo, Gianfranco Politano, "A cDNA Microarray Gene Expression Data Classifier for Clinical Diagnostics Based on Graph Theory", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.8, no. 3, pp. 577-591, May/June 2011, doi:10.1109/TCBB.2010.90
[1] G. Gibson, "Microarray Analysis," PLoS Biology, vol. 1, no. 1, pp. 28-29, Oct. 2003.
[2] P. Larranaga, B. Calvo, R. Santana, C. Bielza, J. Galdiano, I. Inza, J.A. Lozano, R. Armananzas, A. Santafe, G. ad Perez, and V. Robles, "Machine Learning in Bioinformatics," Briefings in Bioinformatics, vol. 7, no. 1, pp. 86-112, Feb. 2006.
[3] E.R. Dougherty, "The Fundamental Role of Pattern Recognition for Gene-Expression/Microarray Data in Bioinformatics," Pattern Recognition, vol. 38, no. 12, pp. 2226-2228, Dec. 2005.
[4] A. Benso, S. Di Carlo, G. Politano, and L. Sterpone, "A Graph-Based Representation of Gene Expression Profiles in DNA Microarrays," Proc. IEEE Symp. Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), pp. 75-82, Sept. 2008.
[5] A. Benso, S. Di Carlo, G. Politano, and L. Sterpone, "Differential Gene Expression Graphs: A Data Structure for Classification in DNA Microarrays," Proc. Eighth IEEE Int'l Conf. BioInformatics and BioEng. (BIBE), pp. 1-6, Oct. 2008.
[6] D. Jiang, C. Tang, and A. Zhang, "Cluster Analysis for Gene Expression Data: A Survey," IEEE Trans. Knowledge and Data Eng., vol. 16, no. 11, pp. 1370-1386, Nov. 2004.
[7] O. Troyanskaya, M. Cantor, G. Sherlock, P. Brown, T. Hastie, R. Tibshirani, D. Botstein, and R. Altman, "Missing Value Estimation Methods for DNA Microarrays," Bioinformatics, vol. 17, no. 6, pp. 520-525, June 2004.
[8] A. Hill, E. Brown, M. Whitley, G. Tucker-Kellog, C. Hunter, and D.K. Slonim, "Evaluation of Normalization Procedures for Oligonucleotide Array Data Based on Spiked cRNA Controls," Genome Biology, vol. 2, no. 12, Nov. 2001.
[9] J. Schuchhardt, D. Beule, A. Malik, E. Wolski, H. Eickhoff, H. Lehrach, and H. Herzel, "Normalization Strategies for cDNA Microarrays," Nucleic Acids Research, vol. 28, no. 10, p. E47, May 2000.
[10] Unigene, unigene , 2010.
[11] J. Stuart, E. Segal, D. Koller, and S. Kom, "A Gene-Coexpression Network for Global Discovery of Conserved Genetic Modules," Science, vol. 302, no. 5643, pp. 249-255, Oct. 2003.
[12] D.B. Allison, X. Cui, G.P. Page, and M. Sabripour, "Microarray Data Analysis: From Disarray to Consolidation to Consensus," Nature Rev.: Genetics, vol. 7, no. 1, pp. 55-65, May 2006.
[13] M.K. Kerr, M. Martin, and G.A. Churchill, "Analysis of Variance for Gene Expression Microarray Data," J. Computational Biology, vol. 7, no. 6, pp. 819-837, Dec. 2000.
[14] C. Cheadle, M.P. Vawter, W.J. Freed, and K.G. Becker, "Analysis of Microarray Data Using Z Score Transformation," J Molecular Diagnostics, vol. 5, no. 2, pp. 73-81, 2003.
[15] D.M. Witten and R. Tibshirani, "A Comparison of Fold-Change and the t-Statistic for Microarray Data Analysis," , 2009.
[16] E. Parzen, "On Estimation of a Probability Density Function and Mode," The Ann. Math. Statistics, vol. 33, no. 3, pp. 1065-1076, 1962.
[17] D. Nguyen and D. Rocke, "Tumor Classification by Partial Least Squares Using Microarray Gene Expression Data," Bioinformatics, vol. 18, no. 1, pp. 39-50, Jan. 2002.
[18] J.W. Lee, J.B. Lee, M. Park, and S.H. Song, "An Extensive Comparison of Recent Classification Tools Applied to Microarray Data," Computational Statistics and Data Analysis, vol. 48, no. 4, pp. 869-885, 2005.
[19] L. Sheng, R. Pique-Regi, S. Asgharzadeh, and A. Ortega, "Microarray Classification Using Block Diagonal Linear Discriminant Analysis with Embedded Feature Selection," Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, pp. 1757-1760, 2009.
[20] Y. Guo, T. Hastie, and R. Tibshirani, "Regularized Linear Discriminant Analysis and Its Application in Microarrays," Biostatistics, vol. 8, no. 1, pp. 86-100, 2007.
[21] P. Xu, G.N. Brock, and R.S. Parrish, "Modified Linear Discriminant Analysis Approaches for Classification of High-Dimensional Microarray Data," Computational Statistics and Data Analysis, vol. 53, no. 5, pp. 1674-1687, 2009.
[22] Nearest Neighbor (NN) Norms: Nn Pattern Classification, B.V. Dasarathy, ed., IEEE CS, 1991.
[23] L. Li, C.R. Weinberg, T.A. Darden, and L.G. Pedersen, "Gene Selection for Sample Classification Based on Gene Expression Data: Study of Sensitivity to Choice of Parameters of the GA/KNN Method," Bioinformatics, vol. 17, no. 12, pp. 1131-1142, 2001.
[24] A. Statnikov, C.F. Aliferis, I. Tsamardinos, D. Hardin, and S. Levy, "A Comprehensive Evaluation of Multicategory Classification Methods for Microarray Gene Expression Cancer Diagnosis," Bioinformatics, vol. 21, no. 5, pp. 631-643, 2005.
[25] J. Liang and S. Kachalo, "Computational Analysis of Microarray Gene Expression Profiles: Clustering, Classification, and Beyond," Chemometrics and Intelligent Laboratory Systems, vol. 62, no. 2, pp. 199-216, 2002.
[26] J. Breiman, L. ad Friedman, C.J. Stone, and R. Olshen, Classification and Regression Trees. Taylor and Francis, Inc, 1984.
[27] L. Breiman, "Random Forests," Machine Learning, vol. 1, no. 45, pp. 5-32, 2001.
[28] H. Zhang, C.-Y. Yu, and B. Singer, "Cell and Tumor Classification Using Gene Expression Data: Construction of Forests," Proc. Nat'l Academy of Science USA, vol. 100, no. 7, pp. 4168-4172, Apr. 2003.
[29] H. Zhang, C.-Y. Yu, B. Singer, and M. Xiong, "Recursive Partitioning for Tumor Classification with Gene Expression Microarray Data," Proc. Nat'l Academy of Science USA, vol. 98, no. 12, pp. 6730-6735, Jun. 2001.
[30] A. Statnikov, L. Wang, and C. Aliferis, "A Comprehensive Comparison of Random Forests and Support Vector Machines for Microarray-Based Cancer Classification," BMC Bioinformatics, vol. 9, no. 1, p. 319, 2008.
[31] R. Diaz-Uriarte and S. Alvarez de Andres, "Gene Selection and Classification of Microarray Data Using Random Forest," BMC Bioinformatics, vol. 7, no. 1, p. 3, 2006.
[32] R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification, second ed. Wiley-Interscience, 2000.
[33] F. Azuaje, "A Computational Neural Approach to Support the Discovery of Gene Function and Classes of Cancer," IEEE Trans. Biomedical Eng., vol. 48, no. 3, pp. 332-339, Mar. 2001.
[34] C.-J. Huang and W.-C. Liao, "Application of Probabilistic Neural Networks to the Class Prediction of Leukemia and Embryonal Tumor of Central Nervous System," Neural Processing Letters, vol. 19, no. 3, pp. 211-226, 2004.
[35] J. Khan, J.S. Wei, M. Ringner, L.H. Saal, M. Ladanyi, F. Westermann, F. Berthold, M. Schwab, C.R. Antonescu, C. Peterson, and P.S. Meltzer, "Classification and Diagnostic Prediction of Cancers Using Gene Expression Profiling and Artificial Neural Networks," Nature Medicine, vol. 7, no. 6, pp. 673-679, June 2001.
[36] G. Bloom, I.V. Yang, D. Boulware, K.Y. Kwong, D. Coppola, S. Eschrich, J. Quackenbush, and T.J. Yeatman, "Multi-Platform, Multi-Site, Microarray-Based Human Tumor Classification," Am. J. Pathology, vol. 164, no. 1, pp. 9-16, 2004.
[37] V.N. Vapnik, "An Overview of Statistical Learning Theory," IEEE Trans. Neural Networks, vol. 10, no. 5, pp. 988-999, Sept. 1999.
[38] G. Natsoulis, L. El Ghaoui, G.R. Lanckriet, A.M. Tolley, F. Leroy, S. Dunlea, B.P. Eynon, C.I. Pearson, S. Tugendreich, and K. Jarnagin, "Classification of a Large Microarray Data Set: Algorithm Comparison and Analysis of Drug Signatures," Genome Research, vol. 15, no. 5, pp. 724-736, 2005.
[39] K. Crammer and Y. Singer, "On the Algorithmic Implementation of Multiclass Kernel-Based Vector Machines," J. Machine Learning Research, vol. 2, pp. 265-292, 2002.
[40] T.S. Furey, N. Cristianini, N. Duffy, D.W. Bednarski, M. Schummer, and D. Haussler, "Support Vector Machine Classification and Validation of Cancer Tissue Samples Using Microarray Expression Data," Bioinformatics, vol. 16, no. 10, pp. 906-914, 2000.
[41] A.-L. Boulesteix, C. Porzelius, and M. Daumer, "Microarray-Based Classification and Clinical Predictors: On Combined Classifiers and Additional Predictive Value," Bioinformatics, vol. 24, no. 15, pp. 1698-1706, 2008.
[42] L. Frey, M. Edgerton, D. Fisher, and S. Levy, "Ensemble Stump Classifiers and Gene Expression Signatures in Lung Cancer," Studies in Health Technology and Informatics, vol. 129, no. Pt 2, pp. 1255-1259, 2007.
[43] J.-H. Hong and S.-B. Cho, "The Classification of Cancer Based on DNA Microarray Data that Uses Diverse Ensemble Genetic Programming," Artificial Intelligence in Medicine, vol. 36, no. 1, pp. 43-58, Jan. 2006.
[44] B. Liu, Q. Cui, T. Jiang, and S. Ma, "A Combinational Feature Selection and Ensemble Neural Network Method for Classification of Gene Expression Data," BMC Bioinformatics, vol. 5, p. 136, Sept. 2004.
[45] Y. Peng, "A Novel Ensemble Machine Learning for Robust Microarray Data Classification," Computers in Biology and Medicine, vol. 36, no. 6, pp. 553-573, June 2006.
[46] A. Statnikov, C.F. Aliferis, I. Tsamardinos, D. Hardin, and S. Levy, "A Comprehensive Evaluation of Multicategory Classification Methods for Microarray Gene Expression Cancer Diagnosis," Bioinformatics, vol. 21, no. 5, pp. 631-643, Mar. 2005.
[47] K.-J. Kim and S.-B. Cho, "Ensemble Classifiers Based on Correlation Analysis for DNA Microarray Classification," Neurocomputing, vol. 70, nos. 1-3, pp. 187-199, 2006.
[48] S.B. Cho and H.-H. Won, "Cancer Classification Using Ensemble of Neural Networks with Multiple Significant Gene Subsets," Applied Intelligence, vol. 26, no. 3, pp. 243-250, 2007.
[49] S. Deegalla and H. Boström, "Classification of Microarrays with kNN: Comparison of Dimensionality Reduction Methods," Proc. Eighth Int'l Conf. Intelligent Data Eng. and Automated Learning (IDEAL), pp. 800-809, 2007.
[50] M. Markou and S. Singh, "Novelty Detection: A Review—Part 1: Statistical Approaches," Signal Processing, vol. 83, pp. 2481-2497, 2003.
[51] V. Hodge and J. Austin, "A Survey of Outlier Detection Methodologies," Artificial Intelligence Rev., vol. 22, no. 2, pp. 85-126, Oct. 2004.
[52] E. Spinosa and A. de Carvalho, "Combining One-Class Classifiers for Robust Novelty Detection in Gene Expression Data," Advances in Bioinformatics and Computational Biology, pp. 54-64, Springer-Verlag, 2005.
[53] X. Yun and R.G. Brereton, "Diagnostic Pattern Recognition on Gene-Expression Profile Data by Using One-Class Classification," J. Chemical Information and Modeling, vol. 45, no. 5, pp. 1392-1401, 2005.
[54] P. Juszczak, D.M. Tax, E.P. Kalska, and R.P. Duin, "Minimum Spanning Tree Based One-Class Classifier," Neurocomputing, vol. 72, nos. 7-9, pp. 1859-1869, 2009.
[55] V. Gesù, G. Bosco, and L. Pinello, "A One Class Classifier for Signal Identification: A Biological Case Study," Proc. 12th Int'l Conf. Knowledge-Based Intelligent Information and Eng. Systems, Part III (KES '08), pp. 747-754, 2008.
[56] cDNA Stanford's Microarray Database, http:/genome-www., 2010.
[57] A.A. Alizadeh, M.B. Eisen, R.E. Davis, C. Ma, I.S. Lossos, A. Rosenwald, J.C. Boldrick, H. Sabet, T. Tran, X. Yu, J.I. Powell, L. Yang, G.E. Marti, T. Moore, J.J. Hudson, L. Lu, D.B. Lewis, R. Tibshirani, G. Sherlock, W.C. Chan, T.C. Greiner, D.D. Weisenburger, J.O. Armitage, R. Warnke, R. Levy, W. Wilson, M.R. Grever, J.C. Byrd, D. Botstein, P.O. Brown, and L.M. Staudt, "Distinct Types of Diffuse Large B-Cell Lymphoma Identified by Gene Expression Profiling," Nature, vol. 403, no. 6769, pp. 503-511, Feb. 2000.
[58] C. Palmer, M. Diehn, A. Alizadeh, and P.O. Browncorresponding, "Cell-Type Specific Gene Expression Profiles of Leukocytes in Human Peripheral Blood," BMC Genomics, vol. 7, no. 115, 2006.
[59] S.P. Bohen, O.G. Troyanskaya, O. Alter, R. Warnke, D. Botstein, P.O. Brown, and R. Levy, "Variation in Gene Expression Patterns in Follicular Lymphoma and the Response to Rituximab," Proc. Nat'l Academy of Science USA, vol. 100, no. 4, pp. 1926-1930, Feb. 2003.
[60] L. Bullinger et al., "Gene-Expression Profiling Identifies Distinct Subclasses of Core Binding Factor Acute Myeloid Leukemia," Blood, vol. 110, no. 4, pp. 1291-1300, 2007.
[61] A.R. Whitney, M. Diehn, S.J. Popper, A.A. Alizadeh, J.C. Boldrick, D.A. Relman, and P.O. Brown, "Individuality and Variation in Gene Expression Patterns in Human Blood," Proc. Nat'l Academy of Science USA, vol. 100, no. 4, pp. 1896-1901, Feb. 2003.
[62] C. Ambroise and G.J. McLachlan, "Selection Bias in Gene Extraction on the Basis of Microarray Gene-Expression Data," Proc. Nat'l Academy of Science USA, vol. 99, pp. 6562-6566, 2002.
[63] A. Rosenwald, A.A. Alizadeh, G. Widhopf, R. Simon, R.E. Davis, X. Yu, L. Yang, O.K. Pickeral, L.Z. Rassenti, J. Powell, D. Botstein, J.C. Byrd, M.R. Grever, B.D. Cheson, N. Chiorazzi, W.H. Wilson, T.J. Kipps, P.O. Brown, and L.M. Staudt, "Relation of Gene Expression Phenotype to Immunoglobulin Mutation Genotype in B Cell Chronic Lymphocytic Leukemia," J. Experimental Medicine, vol. 194, no. 11, pp. 1639-1648, 2001.
[64] R Development Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, 2010. [Online]. Available: http:/
[65] K. Kuhn, "Building Predictive Models in R Using the Caret Package," J. Statistical Software, vol. 28, no. 5, pp. 1-26, Aug. 2008.
[66] D.M.J. Tax, "Ddtools, the Data Description Toolbox for Matlab," , 2010.
3 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool