Issue No.01 - January-March (2010 vol.7)
pp: 108-117
Yukyee Leung , The University of Hong Kong , HK
Yeungsam Hung , The University of Hong Kong, HK
Filters and wrappers are two prevailing approaches for gene selection in microarray data analysis. Filters make use of statistical properties of each gene to represent its discriminating power between different classes. The computation is fast but the predictions are inaccurate. Wrappers make use of a chosen classifier to select genes by maximizing classification accuracy, but the computation burden is formidable. Filters and wrappers have been combined in previous studies to maximize the classification accuracy for a chosen classifier with respect to a filtered set of genes. The drawback of this single-filter-single-wrapper (SFSW) approach is that the classification accuracy is dependent on the choice of specific filter and wrapper. In this paper, a multiple-filter-multiple-wrapper (MFMW) approach is proposed that makes use of multiple filters and multiple wrappers to improve the accuracy and robustness of the classification, and to identify potential biomarker genes. Experiments based on six benchmark data sets show that the MFMW approach outperforms SFSW models (generated by all combinations of filters and wrappers used in the corresponding MFMW model) in all cases and for all six data sets. Some of MFMW-selected genes have been confirmed to be biomarkers or contribute to the development of particular cancers by other studies.
Filters, gene selection, hybrid classification models, microarray data classification, wrappers.
Yukyee Leung, Yeungsam Hung, "A Multiple-Filter-Multiple-Wrapper Approach to Gene Selection and Microarray Data Classification", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.7, no. 1, pp. 108-117, January-March 2010, doi:10.1109/TCBB.2008.46
[1] T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloomfield, and E.S. Lander, "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring," Science, vol. 286, no. 5439, pp. 531-537, 1999.
[2] L. Li, C.R. Weinberg, T.A. Darden, and L.G. Pedersen, "Gene Selection for Sample Classification Based on Gene Expression Data: Study of Sensitivity to Choice of Parameters of the GA/KNN Method," Bioinformatics, vol. 17, no. 12, pp. 1131-1142, 2001.
[3] T.S. Furey, N. Cristianini, N. Duffy, D.W. Bednarski, M. Schummer, and D. Haussler, "Support Vector Machine Classification and Validation of Cancer Tissue Samples Using Microarray Expression Data," Bioinformatics, vol. 16, no. 10, pp. 906-914, 2000.
[4] M.M. Xiong, L. Jin, W. Li, and E. Boerwinkle, "Tumor Classification Using Gene Expression Profiles," Biotechniques, vol. 29, pp. 1264-1270, 2000.
[5] Y. Wang, I.V. Tetko, M.A. Hall, E. Frank, A. Facius, K.F.X. Mayer, and H.W. Mewes, "Gene Selection from Microarray Data for Cancer Classification—A Machine Learning Approach," Computational Biology and Chemistry, vol. 29, no. 1, pp. 37-46, 2005.
[6] M. Xiong, X. Fang, and J. Zhao, "Biomarker Identification by Feature Wrappers," Genome Research, vol. 11, pp. 1878-1887, 2001.
[7] R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification. Wiley, 2000.
[8] M.L. Chow, E.J. Moler, and I.S. Mian, "Identifying Marker Genes in Transcription Profiling Data Using a Mixture of Feature Relevance Experts," Physiological Genomics, vol. 5, pp. 99-111, 2001.
[9] P. Langley, "Selection of Relevant Features in Machine Learning," Proc. AAAI Fall Symp. Relevance, pp. 1-5, 1994.
[10] R. Kohavi and G.H. John, "Wrappers for Feature Subset Selection," Artificial Intelligence, vol. 97, nos. 1/2, pp. 273-324, 1997.
[11] T.P. Speed, Statistical Analysis of Gene Expression Microarray Data. Chapman & Hall/CRC, 2003.
[12] A. Ben-Dor, L. Bruhn, N. Friedman, I. Nachman, M. Schummer, and Z. Yakhini, "Tissue Classification with Gene Expression Profiles," Proc. Fourth Ann. Int'l Conf. Computational Molecular Biology (RECOMB '00), pp. 54-64, 2000.
[13] C. Ding and H. Peng, "Minimum Redundancy Feature Selection from Microarray Gene Expression Data," Proc. IEEE Computer Soc. Bioinformatics Conf. (CSB '03), pp. 523-528, 2003.
[14] C. Lai, M.J.T. Reinders, L.J. van't Veer, and L.F.A. Wessels, "A Comparison of Univariate and Multivariate Gene Selection Techniques for Classification of Cancer Datasets," BMC Bioinformatics, vol. 7, no. 235, 2006.
[15] R. Diaz-Uriarte and S. Alvarez de Andres, "Gene Selection and Classification of Microarray Data Using Random Forest," BMC Bioinformatics, vol. 7, no. 3, 2006.
[16] T.K. Paul and H. Iba, "Extraction of Informative Genes from Microarray Data," Proc. Genetic and Evolutionary Computation Conf. (GECCO '05), pp. 453-460, 2005.
[17] I. Inza, P. Larranaga, R. Blanco, and A.J. Cerrolaza, "Filter versus Wrapper Gene Selection Approaches in DNA Microarray Domains," Artificial Intelligence in Medicine, vol. 31, no. 2, pp. 91-103, 2004.
[18] W. Li and Y. Yang, "How Many Genes are Needed for a Discriminant Microarray Data Analysis?" Proc. Critical Assessment of Microarray Data Analysis Workshop (CAMDA '00), pp. 137-150, 2000.
[19] T. Li, C. Zhang, and M. Ogihara, "A Comparative Study of Feature Selection and Multiclass Classification Methods for Tissue Classification Based on Gene Expression," Bioinformatics, vol. 20, no. 15, pp. 2429-2437, 2004.
[20] E.P. Xing, M.I. Jordan, and R.M. Karp, "Feature Selection for High-Dimensional Genomic Microarray Data," Proc. Int'l Conf. Machine Learning (ICML '01), pp. 601-608, 2001.
[21] J. Liu and H.B. Zhou, "Tumor Classification Based on Gene Microarray Data and Hybrid Learning Method," Proc. Int'l Conf. Machine Learning and Cybernetics, pp. 2275-2280, 2003.
[22] X. Liu, A. Krishnan, and A. Mondry, "An Entropy-Based Gene Selection Method for Cancer Classification Using Microarray Data," BMC Bioinformatics, vol. 6, no. 76, 2005.
[23] M. Ng and L. Chan, "Informative Gene Discovery for Cancer Classification from Microarray Expression Data," Proc. IEEE Workshop Machine Learning for Signal Processing (MLSP '05), pp. 393-398, 2005.
[24] J.W. Lee, J.B. Lee, M. Park, and S.H. Song, "An Extensive Comparison of Recent Classification Tools Applied to Microarray Data," Computational Statistics and Data Analysis, vol. 48, no. 4, pp. 869-885, 2005.
[25] V.N. Vapnik, Statistical Learning Theory. Wiley, 1998.
[26] C. Ambroise and G.J. McLachlan, "Selection Bias in Gene Extraction on the Basis of Microarray Gene-Expression Data," Proc. Nat'l Academy of Sciences USA, vol. 99, no. 10, pp. 6562-6566, 2002.
[27] R. Simon, M.D. Radmacher, K. Dobbin, and L.M. McShane, "Pitfalls in the Use of DNA Microarray Data for Diagnostic and Prognostic Classification," J. Nat'l Cancer Inst., vol. 95, no. 1, pp. 14-18, 2003.
[28] U. Alon, N. Barkai, D.A. Notterman, K. Gish, S. Ybarra, D. Mack, and A.J. Levine, "Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays," Proc. Nat'l Academy of Sciences USA, vol. 96, no. 12, pp. 6745-6750, 1999.
[29] M. West, C. Blanchette, H. Dressman, E. Huang, S. Ishida, R. Spang, H. Zuzan, J.A. Olson Jr., J.R. Marks, and J.R. Nevins, "Predicting the Clinical Status of Human Breast Cancer by Using Gene Expression Profiles," Proc. Nat'l Academy of Sciences USA, vol. 98, no. 20, pp. 11462-11467, 2001.
[30] M.A. Shipp, K.N. Ross, P. Tamayo, A.P. Weng, J.L. Kutok, R.C.T. Aguiar, M. Gaasenbeek, M. Angelo, M. Reich, G.S. Pinkus, T.S. Ray, M.A. Koval, K.W. Last, A. Norton, T.A. Lister, J. Mesirov, D.S. Neuberg, E.S. Lander, J.C. Aster, and T.R. Golub, "Diffuse Large B-Cell Lymphoma Outcome Prediction by Gene-Expression Profiling and Supervised Machine Learning," Nature Medicine, vol. 8, pp. 68-74, 2002.
[31] D. Singh, P. Febbo, K. Ross, D. Jackson, J. Manola, C. Ladd, P. Tamayo, A. Renshaw, A. D'Amico, and J. Richie, "Gene Expression Correlates of Clinical Prostate Cancer Behavior," Cancer Cell, vol. 1, no. 2, pp. 203-209, 2002.
[32] G.J. Gordon, R.V. Jensen, L.L. Hsiao, S.R. Gullans, J.E. Blumenstock, S. Ramaswamy, W.G. Richards, D.J. Sugarbaker, and R. Bueno, "Translation of Microarray Data into Clinically Relevant Cancer Diagnostic Tests Using Gene Expression Ratios in Lung Cancer and Mesothelioma," Cancer Research, vol. 62, no. 17, pp. 4963-4967, 2002.
[33] ccell/1/2/203/DC1index.htm, 2008.
[34] L. Yu and H. Liu, "Redundancy Based Feature Selection for Microarray Data," Proc. 10th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '04), pp. 737-742, 2004.
[35] F. Tan, X. Fu, H. Wang, Y. Zhang, and A.G. Bourgeois, "A Hybrid Feature Selection Approach for Microarray Gene Expression Data," Proc. Int'l Conf. Computational Science (ICCS '06), pp. 678-685, 2006.
[36] D.R. Rhodes, S. Kalyana-Sundaram, V. Mahavisno, R. Varambally, J. Yu, B.B. Briggs, T.R. Barrette, M.J. Anstet, C. Kincead-Beal, P. Kulkarni, S. Varambally, D. Ghosh, and A.M. Chinnaiyan, "Oncomine 3.0: Genes, Pathways, and Networks in a Collection of 18,000 Cancer Gene Expression Profiles," Neoplasia, vol. 9, no. 2, pp. 166-180, 2007.
[37] Swissprot,, 2008.
[38] S.A. Armstrong, J.E. Staunton, L.B. Silverman, R. Pieters, M.L. den Boer, M.D. Minden, S.E. Sallan, E.S. Lander, T.R. Golub, and S.J. Korsmeyer, "MLL Translocations Specify a Distinct Gene Expression Profile that Distinguishes a Unique Leukemia," Nature Genetics, vol. 30, no. 1, pp. 41-47, 2001.
[39] A. Andersson, C. Ritz, D. Lindgren, P. Edén, C. Lassen, J. Heldrup, T. Olofsson, J. Råde, M. Fontes, A. Porwit-MacDonald, M. Behrendtz, M. Höglund, B. Johansson, and T. Fioretos, "Microarray-Based Classification of a Consecutive Series of 121 Childhood Acute Leukemias: Prediction of Leukemic and Genetic Subtype as Well as of Minimal Residual Disease Status," Leukemia, vol. 21, no. 6, pp. 1198-1203, 2007.
[40] C. Vitale, C. Romagnani, A. Puccetti, D. Olive, R. Costello, L. Chiossone, A. Pitto, A. Bacigalupo, L. Moretta, and M.C. Mingari, "Surface Expression and Function of p75/AIRM-1 or CD33 in Acute Myeloid Leukemias: Engagement of CD33 Induces Apoptosis of Leukemic Cells," Proc. Nat'l Academy of Sciences USA, vol. 98, no. 10, pp. 5764-5769, 2001.
[41] K. Kuriki, N. Hamajima, H. Chiba, Y. Kanemitsu, T. Hirai, T. Kato, T. Saito, K. Matsuo, K. Koike, S. Tokudome, and K. Tajima, "Relation of the CD36 Gene A52C Polymorphism to the Risk of Colorectal Cancer among Japanese, with Reference to with the Aldehyde Dehydrogenase 2 Gene Glu487Lys Polymorphism and Drinking Habit," Asian Pacific J. Cancer Prevention, vol. 6, no. 1, pp. 62-68, 2005.
[42] D.A. Notterman, U. Alon, A.J. Sierk, and A.J. Levine, "Transcriptional Gene Expression Profiles of Colorectal Adenoma, Adenocarcinoma, and Normal Tissue Examined by Oligonucleotide Arrays," Cancer Research, vol. 61, pp. 3124-3130, 2001.
[43] M.J. van de Vijver, Y.D. He, L.J. van't Veer, H. Dai, A.A. Hart, D.W. Voskuil, G.J. Schreiber, J.L. Peterse, C. Roberts, M.J. Marton, M. Parrish, D. Atsma, A. Witteveen, A. Glas, L. Delahaye, T. van der Velde, H. Bartelink, S. Rodenhuis, E.T. Rutgers, S.H. Friend, and R. Bernards, "A Gene-Expression Signature as a Predictor of Survival in Breast Cancer," New England J. Medicine, vol. 347, no. 25, pp. 1999-2009, 2002.
[44] Y. Wang, J.G. Klijn, Y. Zhang, A.M. Sieuwerts, M.P. Look, F. Yang, D. Talantov, M. Timmermans, M.E. Meijer-van Gelder, J. Yu, T. Jatkoe, E.M. Berns, D. Atkins, and J.A. Foekens, "Gene-Expression Profiles to Predict Distant Metastasis of Lymph-Node-Negative Primary Breast Cancer," Lancet, vol. 365, no. 9460, pp. 671-679, 2005.
[45] L.J. van 't Veer, H. Dai, M.J. van de Vijver, Y.D. He, A.A. Hart, M. Mao, H.L. Peterse, K. van der Kooy, M.J. Marton, A.T. Witteveen, G.J. Schreiber, R.M. Kerkhoven, C. Roberts, P.S. Linsley, R. Bernards, and S.H. Friend, "Gene Expression Profiling Predicts Clinical Outcome of Breast Cancer," Nature, vol. 415, no. 6871, pp. 530-536, 2002.
[46] A.A. Alizadeh, M.B. Eisen, R.E. Davis, C. Ma, I.S. Lossos, A. Rosenwald, J.C. Boldrick, H. Sabet, T. Tran, X. Yu, J.I. Powell, L. Yang, G.E. Marti, T. Moore, J. Hudson Jr., L. Lu, D.B. Lewis, R. Tibshirani, G. Sherlock, W.C. Chan, T.C. Greiner, D.D. Weisenburger, J.O. Armitage, R. Warnke, R. Levy, W. Wilson, M.R. Grever, J.C. Byrd, D. Botstein, P.O. Brown, and L.M. Staudt, "Distinct Types of Diffuse Large B-Cell Lymphoma Identified by Gene Expression Profiling," Nature, vol. 403, no. 6769, pp. 503-511, 2000.
[47] J. Lapointe, C. Li, J.P. Higgins, M. van de Rijn, E. Bair, K. Montgomery, M. Ferrari, L. Egevad, W. Rayford, U. Bergerheim, P. Ekman, A.M. DeMarzo, R. Tibshirani, D. Botstein, P.O. Brown, J.D. Brooks, and J.R. Pollack, "Gene Expression Profiling Identifies Clinically Relevant Subtypes of Prostate Cancer," Proc. Nat'l Academy of Sciences USA, vol. 101, no. 3, pp. 811-816, 2004.
[48] A. Bhattacharjee, W.G. Richards, J. Staunton, C. Li, S. Monti, P. Vasa, C. Ladd, J. Beheshti, R. Bueno, M. Gillette, M. Loda, G. Weber, E.J. Mark, E.S. Lander, W. Wong, B.E. Johnson, T.R. Golub, D.J. Sugarbaker, and M. Meyerson, "Classification of Human Lung Carcinomas by mRNA Expression Profiling Reveals Distinct Adenocarcinoma Subclasses," Proc. Nat'l Academy of Sciences USA, vol. 98, no. 24, pp. 13790-13795, 2001.
[49] S. Tomida, K. Koshikawa, Y. Yatabe, T. Harano, N. Ogura, T. Mitsudomi, M. Some, K. Yanagisawa, T. Takahashi, H. Osada, and T. Takahashi, "Gene Expression-Based, Individualized Outcome Prediction for Surgically Treated Lung Cancer Patients," Oncogene, vol. 23, pp. 5360-5370, 2004.
[50] A.H. Bild, G. Yao, J.T. Chang, Q. Wang, A. Potti, D. Chasse, M.B. Joshi, D. Harpole, J.M. Lancaster, A. Berchuck, J.A. Olson, J.R. Marks, H.K. Dressman, M. West, and J.R. Nevins, "Oncogenic Pathway Signatures in Human Cancers as a Guide to Targeted Therapies," Nature, vol. 439, no. 7074, pp. 353-357, 2006.