The Community for Technology Leaders
RSS Icon
Issue No.02 - April-June (2010 vol.7)
pp: 263-277
Yew-Soon Ong , Nanyang Technological University, Singapore
Jacek M. Zurada , University of Louisville, Louisville
Multiclass cancer classification on microarray data has provided the feasibility of cancer diagnosis across all of the common malignancies in parallel. Using multiclass cancer feature selection approaches, it is now possible to identify genes relevant to a set of cancer types. However, besides identifying the relevant genes for the set of all cancer types, it is deemed to be more informative to biologists if the relevance of each gene to specific cancer or subset of cancer types could be revealed or pinpointed. In this paper, we introduce two new definitions of multiclass relevancy features, i.e., full class relevant (FCR) and partial class relevant (PCR) features. Particularly, FCR denotes genes that serve as candidate biomarkers for discriminating all cancer types. PCR, on the other hand, are genes that distinguish subsets of cancer types. Subsequently, a Markov blanket embedded memetic algorithm is proposed for the simultaneous identification of both FCR and PCR genes. Results obtained on commonly used synthetic and real-world microarray data sets show that the proposed approach converges to valid FCR and PCR genes that would assist biologists in their research work. The identification of both FCR and PCR genes is found to generate improvement in classification accuracy on many microarray data sets. Further comparison study to existing state-of-the-art feature selection algorithms also reveals the effectiveness and efficiency of the proposed approach.
Bioinformatics, microarray, multiclass cancer classification, feature/gene selection, memetic algorithm, Markov blanket.
Yew-Soon Ong, Jacek M. Zurada, "Identification of Full and Partial Class Relevant Genes", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.7, no. 2, pp. 263-277, April-June 2010, doi:10.1109/TCBB.2008.105
[1] T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloomfield, and E.S. Lander, "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring," Science, vol. 286, no. 5439, pp. 531-537, 1999.
[2] D.V. Nguyen and D.M. Rocke, "Tumor Classification by Partial Least Squares Using Microarray Gene Expression Data," Bioinformatics, vol. 18, no. 1, pp. 39-50, 2002.
[3] S. Dudoit, J. Fridlyand, and T.P. Speed, "Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data," J. Am. Statistical Assoc., vol. 97, no. 457, pp. 77-87, 2002.
[4] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, "Gene Selection for Cancer Classification Using Support Vector Machines," Machine Learning, vol. 46, no. 1-3, pp. 389-422, 2002.
[5] X. Zhou and K.Z. Mao, "Ls Bound Based Gene Selection for DNA Microarray Data," Bioinformatics, vol. 21, no. 8, pp. 1559-1564, 2005.
[6] S. Ramaswamy, P. Tamayo, R. Rifkin, S. Mukherjee, C. Yeang, M. Angelo, C. Ladd, M. Reich, E. Latulippe, J.P. Mesirov, T. Poggio, W. Gerald, M. Loda, E.S. Lander, and T.R. Golub, "Multiclass Cancer Diagnosis Using Tumor Gene Expression Signatures," Proc. Nat'l Academy of Sciences USA, vol. 98, no. 26, pp. 15149-15154, 2001.
[7] D.V. Nguyen and D.M. Rocke, "Multi-Class Cancer Classification via Partial Least Squares with Gene Expression Profiles," Bioinformatics, vol. 18, no. 9, pp. 1216-1226, 2002.
[8] C.H. Ooi and P. Tan, "Genetic Algorithms Applied to Multi-Class Prediction for the Analysis of Gene Expression Data," Bioinformatics, vol. 19, no. 1, pp. 37-44, 2003.
[9] J.J. Liu, G. Cutler, W. Li, Z. Pan, S. Peng, T. Hoey, L. Chen, and X.B. Ling, "Multiclass Cancer Classification and Biomarker Discovery Using GA-Based Algorithms," Bioinformatics, vol. 21, no. 11, pp. 2691-2697, 2005.
[10] C.H. Ooi, M. Chetty, and S.W. Teng, "Differential Prioritization between Relevance and Redundancy in Correlation-Based Feature Selection Techniques for Multiclass Gene Expression Data," BMC Bioinformatics, vol. 7, no. 320, 2006.
[11] R. Tibshirani, T. Hastie, B. Narasimhan, and G. Chu, "Diagnosis of Multiple Cancer Types by Shrunken Centroids of Gene Expression," Proc. Nat'l Academy of Sciences USA, vol. 99, no. 10, pp. 6567-6572, 2002.
[12] K.Y. Yeung and R.E. Bumgarner, "Multiclass Classification of Microarray Data with Repeated Measurements: Application to Cancer," Genome Biology, vol. 4, no. 12, p. R83, 2003.
[13] K.Y. Yeung, R.E. Bumgarner, and A.E. Raftery, "Bayesian Model Averaging: Development of an Improved Multi-Class, Gene Selection and Classification Tool for Microarray Data," Bioinformatics, vol. 21, no. 10, pp. 2394-2402, 2005.
[14] T. Li, C. Zhang, and M. Ogihara, "A Comparative Study of Feature Selection and Multiclass Classification Methods for Tissue Classification Based on Gene Expression," Bioinformatics, vol. 20, no. 15, pp. 2429-2437, 2004.
[15] A. Statnikov, C.F. Aliferis, I. Tsamardinos, D. Hardin, and S. Levy, "A Comprehensive Evaluation of Multicategory Classification Methods for Microarray Gene Expression Cancer Diagnosis," Bioinformatics, vol. 21, no. 5, pp. 631-643, 2005.
[16] J. Pearl, Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, 1988.
[17] D. Koller and M. Sahami, "Toward Optimal Feature Selection," Proc. 13th Int'l Conf. Machine Learning (ICML '96), pp. 284-292, 1996.
[18] L. Yu and H. Liu, "Efficient Feature Selection via Analysis of Relevance and Redundancy," J. Machine Learning Research, vol. 5, pp. 1205-1224, 2004.
[19] I. Tsamardinos and C.F. Aliferis, "Towards Principled Feature Selection: Relevance, Filters, and Wrappers," Proc. Ninth Int'l Workshop Artificial Intelligence and Statistics (AI&Stats), 2003.
[20] D.E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, 1989.
[21] L.S. Oliveira, M. Morita, and R. Sabourin, Feature Selection for Ensembles Using the Multi-Objective Optimization Approach, chapter 3, Yaochu Jin, ed., Multi-Objective Machine Learning, Springer, 2006.
[22] A. Tsymbal, S. Puuronen, and D.W. Patterson, "Ensemble Feature Selection with the Simple Bayesian Classification," Information Fusion, vol. 4, no. 2, pp. 87-100, 2003.
[23] R. Kohavi and G.H. John, "Wrapper for Feature Subset Selection," Artificial Intelligence, vol. 97, no. 1/2, pp. 273-324, 1997.
[24] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, "A Fast and Elitist Multiobjective Genetic Algorithm: Nsga-II," IEEE Trans. Evolutionary Computation, vol. 6, no. 2, pp. 182-197, 2002.
[25] H. Ishibuchi and T. Murata, "A Multi-Objective Genetic Local Search Algorithm and Its Application to Flowshop Scheduling," IEEE Trans. Systems, Man, and Cybernetics-Part C: Applications and Rev., vol. 28, no. 3, pp. 392-403, 1998.
[26] H. Ishibuchi, T. Yoshida, and T. Murata, "Balance between Genetic Search and Local Search in Memetic Algorithm for Multiobjective Permutation Flowshop Scheduling," IEEE Trans. Evolutionary Computation, vol. 7, no. 2, pp. 204-223, 2003.
[27] J. Handl, D.B. Kell, and J. Knowles, "Multiobjective Optimization in Bioinformatics and Computational Biology," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 4, no. 2, pp. 279-292, 2007.
[28] Z. Zhu, Y.S. Ong, and M. Dash, "Markov Blanket-Embedded Genetic Algorithm for Gene Selection," Pattern Recognition, vol. 40, no. 11, pp. 3236-3248, 2007.
[29] J.H. Holland, Adaptation in Natural Artificial Systems, second ed. MIT Press, 1992.
[30] Z. Zhu, Y.S. Ong, and M. Dash, "Wrapper-Filter Feature Selection Algorithm Using a Memetic Framework," IEEE Trans. Systems, Man and Cybernetics-Part B, vol. 37, no. 1, pp. 70-76, 2007.
[31] H. Ishibuchi and T. Nakashima, "Multi-Objective Pattern and Feature Selection by a Genetic Algorithm," Proc. Genetic and Evolutionary Computation Conf. (GECCO '00), pp. 1069-1076, 2000.
[32] J. Liu and H. Iba, "Selecting Informative Genes Using a Multiobjective Evolutionary Algorithm," Proc. Congress on Evolutionary Computation (CEC), 2002.
[33] K. Deb and A.R. Reddy, "Reliable Classification of Two-Class Cancer Data Using Evolutionary Algorithms," BioSystems, vol. 72, pp. 111-129, 2003.
[34] M. Banerjee, S. Mitra, and A. Anand, Feature Selection Using Rough Sets, chapter 1, Yaochu Jin, ed., Multi-Objective Machine Learning, Springer, 2006.
[35] Y.S. Ong, M.H. Lim, N. Zhu, and K.W. Wong, "Classification of Adaptive Memetic Algorithms: A Comparative Study," IEEE Trans. Systems, Man and Cybernetics-Part B, vol. 36, no. 1, pp. 141-152, 2006.
[36] Y.S. Ong and A.J. Keane, "Meta-Lamarckian in Memetic Algorithm," IEEE Trans. Evolutionary Computation, vol. 8, no. 2, pp. 99-110, 2004.
[37] A. Jaszkiewicz, "Do Multiple-Objective Metaheuristics Deliver on Their Promise? A Computational Experiment on the Set-Covering Problem," IEEE Trans. Evolutionary Computation, vol. 7, no. 2, pp. 133-143, 2003.
[38] J.D. Knowles and D.W. Corne, "M-Paes: A Memetic Algorithm for Multiobjective Optimization," Proc. Congress on Evolutionary Computation (CEC), 2000.
[39] W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery, Numerical Recipes in C. Cambridge Univ. Press, 1998.
[40] J.E. Baker, "Adaptive Selection Methods for Genetic Algorithms," Proc. First Int'l Conf. Genetic Algorithms (ICGA '85), pp. 101-111, 1985.
[41] U.M. Braga-Neto and E.R. Dougherty, "Is Cross-Validation Valid for Small-Sample Microarray Classification?" Bioinformatics, vol. 20, no. 3, pp. 374-380, 2004.
[42] R. Diaz-Uriarte and S.A. de Andres, "Gene Selection and Classification of Microarray Data Using Random Forest," BMC Bioinformatics, vol. 7, no. 3, 2006.
[43] A.A. Alizadeh, M.B. Eisen, E.E. Davis et al., "Distinct Types of Diffuse Large B-Cell Lymphoma Identified by Gene Expression Profiling," Nature, vol. 403, no. 6769, pp. 503-511, 2000.
[44] S.A. Armstrong, J.E. Staunton, L.B. Silverman, R. Pieters, M.L. den Boer, M.D. Minden, S.E. Sallan, E.S., Lander, T.R. Golub, and S.J. Korsmeyer, "MLL Translocations Specify a Distinct Gene Expression Profile That Distinguishes a Unique Leukemia," Nature Genetics, vol. 30, no. 1, pp. 41-47, 2002.
[45] J. Khan, J.S. Wei, M. Ringner, L.H. Saal, M. Ladanyi, F. Westermann, F. Berthold, M. Schwab, C.R. Antonescu, C. Peterson, and P.S. Meltzer, "Classification and Diagnostic Prediction of Cancers Using Gene Expression Profiling and Artificial Neural Networks," Nature Medicine, vol. 7, no. 6, pp. 673-679, 2001.
[46] D.T. Ross, U. Scherf, M.B. Eisen et al., "Systematic Variation in Gene Expression Patterns in Human Cancer Cell Lines," Nature Genetics, vol. 24, no. 3, pp. 208-209, 2000.
[47] C.L. Nutt, D.R. Mani, R.A. Betensky et al., "Gene Expression-Based Classification of Malignant Gliomas Correlates Better with Survival than Histological Classification," Cancer Research, vol. 63, no. 7, pp. 1602-1607, 2003.
[48] N. Yukinawa, S. Oba, K. Kato, K. Taniguchi, K. Iwao-Koizumi, Y. Tamaki, S. Noguchi, and S. Ishii, "A Multi-Class Predictor Based on a Probabilistic Model: Application to Gene Expression Profiling-Based Diagnosis of Thyroid Tumors," BMC Genomics, vol. 7, no. 190, 2006.
[49] E.J. Yeoh, M.E. Ross, S.A. Shurtleff et al., , "Classification, Subtype Discovery, and Prediction of Outcome in Pediatric Acute Lymphoblastic Leukemia by Gene Expression Profiling," Cancer Cell, vol. 1, no. 2, pp. 109-110, 2002.
[50] A. Bhattacherjee, W.G. Richards, J. Stauton et al., "Classification of Human Lung Carcinomas by mRNA Expression Profiling Reveals Distinct Adenocarcinoma Subclasses," Proc. Nat'l Academy of Sciences USA , vol. 98, no. 24, pp. 13790-13795, 2001.
[51] S.R. Bauer, H. Kubagawa, I. Maclennan, and F. Melchers, "Vpreb Gene Expression in Hematopoietic Malignancies: A Lineage- and Stage-Restricted Marker for B-Cell Precursor Leukemias," Blood, vol. 78, no. 6, pp. 1581-1588, 1991.
[52] A.V. Krivtsov, D. Twomey, Z. Feng, M.C. Stubbs, Y. Wang, J. Faber, J.E. Levine, J. Wang, W.C. Hahn, D.G. Gilliland, T.R. Golub, and S.A. Armstrong, "Transformation from Committed Progenitor to Leukemia Stem Cell Initiated by MLL-Af9," Nature, vol. 442, no. 7104, pp. 818-822, 2006.
[53] C. Ambroise and G.J. McLachlan, "Selection Bias in Gene Extraction on the Basis of Microarray Gene-Expression Data," Proc. Nat'l Academy of Sciences USA, vol. 99, no. 10, pp. 6562-6566, 2002.
[54] J. Reunanen, "Overfitting in Making Comparisons between Variable Selection Methods," J. Machine Learning Research, vol. 3, pp. 1371-1382, 2003.
15 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool