The Community for Technology Leaders
RSS Icon
Issue No.02 - March/April (2011 vol.8)
pp: 452-463
Saras Saraswathi , Iowa State University, Ames, IA
Suresh Sundaram , Indian Institute of Technology, New Delhi, India
Narasimhan Sundararajan , Nanyang Technological University, Singapore
Michael Zimmermann , Iowa State University, Ames, IA
Marit Nilsen-Hamilton , Iowa state University, Ames, IA
A combination of Integer-Coded Genetic Algorithm (ICGA) and Particle Swarm Optimization (PSO), coupled with the neural-network-based Extreme Learning Machine (ELM), is used for gene selection and cancer classification. ICGA is used with PSO-ELM to select an optimal set of genes, which is then used to build a classifier to develop an algorithm (ICGA_PSO_ELM) that can handle sparse data and sample imbalance. We evaluate the performance of ICGA-PSO-ELM and compare our results with existing methods in the literature. An investigation into the functions of the selected genes, using a systems biology approach, revealed that many of the identified genes are involved in cell signaling and proliferation. An analysis of these gene sets shows a larger representation of genes that encode secreted proteins than found in randomly selected gene sets. Secreted proteins constitute a major means by which cells interact with their surroundings. Mounting biological evidence has identified the tumor microenvironment as a critical factor that determines tumor survival and growth. Thus, the genes identified by this study that encode secreted proteins might provide important insights to the nature of the critical biological features in the microenvironment of each tumor type that allow these cells to thrive and proliferate.
Biology and genetics, classifier design and evaluation, feature evaluation and selection, neural nets.
Saras Saraswathi, Suresh Sundaram, Narasimhan Sundararajan, Michael Zimmermann, Marit Nilsen-Hamilton, "ICGA-PSO-ELM Approach for Accurate Multiclass Cancer Classification Resulting in Reduced Gene Sets in Which Genes Encoding Secreted Proteins Are Highly Represented", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.8, no. 2, pp. 452-463, March/April 2011, doi:10.1109/TCBB.2010.13
[1] S. Ramaswamy, P. Tamayo, R. Rifkin, S. Mukherjee, C.H. Yeang, M. Angelo, C. Ladd, M. Reich, E. Latulippe, J.P. Mesirov, T. Poggio, W. Gerald, M. Loda, E.S. Lander, and T.R. Golub, "Multiclass Cancer Diagnosis Using Tumor Gene Expression Signatures," Proc. Nat'l Academy of Sciences USA, vol. 98, no. 26, pp. 15149-15154, Dec. 2001.
[2] S. Peng, Q. Xu, X.B. Ling, X. Peng, W. Dua, and L. Chen, "Molecular Classification of Cancer Types from Microarray Data Using the Combination of Genetic Algorithms and Support Vector Machine," FEBS Letters, vol. 555, no. 2, pp. 358-362, 2003.
[3] Y. Saeys, I. Inza, and P. Larrañaga, "A Review of Feature Selection Techniques in Bioinformatics," Bioinformatics, vol. 23, no. 19, pp. 2507-2517, Oct. 2007.
[4] D. Koller and M. Sahami, "Toward Optimal Feature Selection," Proc. 13th Int'l Conf. Machine Learning, pp. 284-292, 1996.
[5] G. Piatetsky-Shapiro and P. Tamayo, "Microarray Data Mining: Facing the Challenges," SIGKDD Explorations, vol. 5, no. 2, pp. 1-5, Dec. 2003.
[6] L. Ein-Dor, O. Zuk, and E. Domany, "Thousands of Samples Are Needed to Generate a Robust Gene List for Predicting Outcome in Cancer," Proc. Nat'l Academy of Sciences USA, vol. 103, no. 15, pp. 5923-5928, Apr. 2006.
[7] G. Stolovitzky, "Gene Selection in Microarray Data: The Elephant, the Blind Men and Our Algorithms," Current Opinion in Structural Biology, vol. 13, no. 3, pp. 370-376, June 2003.
[8] I. Guyon, S. Gunn, A. Ben-Hur, and G. Dror, "Result Analysis of the NIPS 2004 Feature Selection Challenge," Proc. Conf. Advances in Neural Information Processing Systems (NIPS), vol. 17, pp. 545-552, 2004.
[9] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, "Gene Selection for Cancer Classification Using Support Vector Machines," Machine Learning, vol. 46, nos. 1-3, pp. 389-422, 2002.
[10] X. Zhou and D. Tuck, "MSVM-RFE: Extensions of SVM-RFE for Multiclass Gene Selection on DNA Microarray Data," Bioinformatics, vol. 23, no. 9, pp. 1106-1114, 2007.
[11] Y. Tang, Y.-Q. Zhang, and Z. Huang, "Development of Two-Stage SVM-RFE Gene Selection Strategy for Microarray Expression Data Analysis," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 4, no. 3, pp. 365-381, July-Sept. 2007.
[12] Y. Wang, I.V. Tetko, H.A. Mark, E. Frank, A. Facius, K.F.X. Mayer, and H.W. Mewes, "Gene Selection from Microarray Data for Cancer Classification—a Machine Learning Approach," Computational Biology and Chemistry, vol. 29, no. 1, pp. 37-46, Feb. 2005.
[13] L. Zne-Jung, "An Integrated Algorithm for Gene Selection and Classification Applied to Microarray Data of Ovarian Cancer," Artificial Intelligence in Medicine, vol. 42, no. 1, pp. 81-93, 2008.
[14] T.-C. Lin, R.-S. Liu, Y.-T. Chao, and S.-Y. Chen, "Multiclass Microarray Data Classification Using GA/ANN Method," PRICAI 2006: Trends in Artificial Intelligence, pp. 1037-1041, Springer, 2006.
[15] C.H. Ooi and P. Tan, "Genetic Algorithms Applied to Multi-Class Prediction for the Analysis of Gene Expression Data," Bioinformatics, vol. 19, no. 1, pp. 37-44, Jan. 2003.
[16] A. Statnikov, C.F. Aliferis, I. Tsamardinos, D. Hardin, and S. Levy, "A Comprehensive Evaluation of Multicategory Classification Methods for Microarray Gene Expression Cancer Diagnosis," Bioinformatics, vol. 21, no. 5, pp. 631-643, 2005.
[17] L. Wang, F. Chu, and W. Xie, "Accurate Cancer Classification Using Expressions of Very Few Genes," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 4, no. 1, pp. 40-53, Jan.-Mar. 2007.
[18] N. Yukinawa, S. Oba, K. Kato, and S. Ishii, "Optimal Aggregation of Binary Classifiers for Multiclass Cancer Diagnosis Using Gene Expression Profiles," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 6, no. 2, pp. 333-343, Apr.-June 2009.
[19] J.J. Liu, G. Cutler, W. Li, Z. Pan, S. Peng, T. Hoey, L. Chen, and X.B. Ling, "Multiclass Cancer Classification and Biomarker Discovery Using GA-Based Algorithms," Bioinformatics, vol. 21, no. 11, pp. 2691-2697, June 2005.
[20] J.-H. Hong and S.-B. Cho, "A Probabilistic Multi-Class Strategy of One-vs.-Rest Support Vector Machines for Cancer Classification," Neurocomputing, vol. 71, nos. 16-18, pp. 3275-3281, 2008.
[21] J. Kennedy and R.C. Eberhart, "Particle Swarm Optimization," Proc. IEEE Int'l Conf. Neural Networks, pp. 1942-1948, 1995.
[22] G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, "Extreme Learning Machine: Theory and Applications," Neurocomputing, vol. 70, pp. 489-501, 2006.
[23] S. Suresh, S. Saraswathi, and N. Sundararajan, "Performance Enhancement of Extreme Learning Machine for Multi-Category Sparse Cancer Classification," Eng. Applications of Artificial Intelligence, vol. 23, pp. 1149-1157, 2010.
[24] A.A. Alizadeh, M.B. Eisen, R.E. Davis, C. Ma, I.S. Lossos, A. Rosenwald, J.C. Boldrick, H. Sabet, T. Tran, X. Yu, J.I. Powell, L. Yang, G.E. Marti, T. Moore, J. Hudson,Jr., L. Lu, D.B. Lewis, R. Tibshirani, G. Sherlock, W.C. Chan, T.C. Greiner, D.D. Weisenburger, J.O. Armitage, R. Warnke, R. Levy, W. Wilson, M.R. Grever, J.C. Byrd, D. Botstein, P.O. Brown, and L.M. Staudt, "Distinct Types of Diffuse Large B-cell Lymphoma Identified by Gene Expression Profiling," Nature, vol. 403, no. 6769, pp. 503-511, Feb. 2000.
[25] S.L. Pomeroy, P. Tamayo, M. Gaasenbeek, L.M. Sturla, M. Angelo, M.E. McLaughlin, J.Y. Kim, L.C. Goumnerova, P.M. Black, C. Lau, J.C. Allen, D. Zagzag, J.M. Olson, T. Curran, C. Wetmore, J.A. Biegel, T. Poggio, S. Mukherjee, R. Rifkin, A. Califano, G. Stolovitzky, D.N. Louis, J.P. Mesirov, E.S. Lander, and T.R. Golub, "Prediction of Central Nervous System Embryonal Tumor Outcome Based on Gene Expression," Nature, vol. 415, no. 6870, pp. 436-442, Jan. 2002.
[26] M. West, C. Blanchette, H. Dressman, E. Huang, S. Ishida, R. Spang, H. Zuzan, J.A. Olson,Jr., J.R. Marks, and J.R. Nevins, "Predicting the Clinical Status of Human Breast Cancer by Using Gene Expression Profiles," Proc. Nat'l Academy of Sciences USA, vol. 98, no. 20, pp. 11462-11467, Sept. 2001.
[27] M. Ashburner, C.A. Ball, J.A. Blake, D. Botstein, H. Butler, J.M. Cherry, A.P. Davis, K. Dolinski, S.S. Dwight, J.T. Eppig, M.A. Harris, D.P. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, J.C. Matese, J.E. Richardson, M. Ringwald, G.M. Rubin, and G. Sherlock, "Gene Ontology: Tool for the Unification of Biology. The Gene Ontology Consortium," Nature Genetics, vol. 25, no. 1, pp. 25-29, May 2000.
[28] P. Rameshwar, "Potential Novel Targets in Breast Cancer," Current Pharmaceutical Biotechnology, vol. 10, no. 2, pp. 148-153, Feb. 2009.
[29] S. Suresh, R.V. Babu, and H.J. Kim, "No-Reference Image Quality Assessment Using Modified Extreme Learning Machine Classifier," Applied Soft Computing, vol. 9, no. 2, pp. 541-552, 2009.
[30] M.K. Clerc and J. Kennedy, "The Particle Swarm—Explosion, Stability, and Convergence in a Multidimensional Complex Space," IEEE Trans. Evolutionary Computation, vol. 6, no. 1, pp. 58-73, Feb. 2002.
[31] S. Suresh, N. Sundarajan, and P. Saratchandran, "A Sequential Multi-Category Classifier Using Radial Basis Function Networks," Neurocomputing, vol. 71, nos. 7-9, pp. 1345-1358, 2008.
[32] Z. Michalewicz, Genetic Algorithm + Data Structures = Evolution Programs, third ed., pp. 18-22. Springer-Verlag, 1994.
[33] J.D. Schaffer, D. Whitley, and L.J. Eshelman, "Combinations of Genetic Algorithms and Neural Networks: A Survey of the State of the Art," Proc. Int'l Workshop Combinations of Genetic Algorithms and Neural Networks (COGANN-92 ), pp. 1-37, 1992,
[34] S. Suresh, V. Mani, S.N. Omkar, and H.J. Kim, "Divisible Load Scheduling in Tree Network with Limited Memory: A Genetic Algorithm and Linear Programming Approach," Int'l J. Parallel Emergent and Distributed System, vol. 21, no. 5, pp. 303-321, 2006.
[35] S. Suresh, N. Sundararajan, and P. Saratchandran, "Risk-Sensitive Loss Functions for Sparse Multi-Category Classification Problems," Information Science, vol. 178, no. 12, pp. 2621-2638, 2008.
[36] R. Zhang, G.-B. Huang, N. Sundararajan, and P. Saratchandran, "Multicategory Classification Using an Extreme Learning Machine for Microarray Gene Expression Cancer Diagnosis," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 4, no. 3, pp. 485-495, July-Sept. 2007.
[37] Y. Zhang, C. Ding, and T. Li, "Gene Selection Algorithm by Combining ReliefF and mRMR," BMC Genomics, vol. 9, no. 2, p. S27, 2008.
[38] D. Hanahan and R.A. Weinberg, "The Hallmarks of Cancer," Cell, vol. 100, no. 1, pp. 57-70, Jan. 2000.
[39] A. Mantovani, "Cancer: Inflaming Metastasis," Nature, vol. 457, no. 7225, pp. 36-37, Jan. 2009.
[40] F. Mbeunkui and D.J. Johann,Jr., "Cancer and the Tumor Microenvironment: A Review of an Essential Relationship," Cancer Chemotherapy and Pharmacology, vol. 63, no. 4, pp. 571-582, Mar. 2009.
[41] T. Beissbarth and T.P. Speed, "GOstat: Find Statistically Overrepresented Gene Ontologies within a Group of Genes," Bioinformatics, vol. 20, no. 9, pp. 1464-1465, June 2004.
[42] I. Vastrik, P. D'Eustachio, E. Schmidt, G. Joshi-Tope, G. Gopinath, D. Croft, B. de Bono, M. Gillespie, B. Jassal, S. Lewis, L. Matthews, G. Wu, E. Birney, and L. Stein, "Reactome: A Knowledge Base of Biologic Pathways and Processes," Genome Biology, vol. 8, p. R39, 2007.
[43] H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, and P.E. Bourne, "The Protein Data Bank," Nucleic Acids Research, vol. 28, pp. 235-242, 2000.
[44] J.A. Joyce and J.W. Pollard, "Microenvironmental Regulation of Metastasis," Nature Rev. Cancer, vol. 9, pp. 239-252, Mar. 2008.
[45] Y.N. Niu and S.J. Xia, "Stroma-Epithelium Crosstalk in Prostate Cancer," Asian J. Andrology, vol. 11, no. 1, pp. 28-35, Jan. 2009.
[46] S.B. Coffelt, R. Hughes, and C.E. Lewis, "Tumor-Associated Macrophages: Effectors of Angiogenesis and Tumor Progression," Biochim Biophys Acta, vol. 1796, no. 1, pp. 11-18, Mar. 2009.
21 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool