Prediction of Cancer Class with Majority Voting Genetic Programming Classifier Using Gene Expression Data
Issue No. 02 - April-June (2009 vol. 6)
Topon Kumar Paul , Toshiba Corporation, Kanagawa
Hitoshi Iba , The University of Tokyo, Japan
In order to get a better understanding of different types of cancers and to find the possible biomarkers for diseases, recently, many researchers are analyzing the gene expression data using various machine learning techniques. However, due to a very small number of training samples compared to the huge number of genes and class imbalance, most of these methods suffer from overfitting. In this paper, we present a majority voting genetic programming classifier (MVGPC) for the classification of microarray data. Instead of a single rule or a single set of rules, we evolve multiple rules with genetic programming (GP) and then apply those rules to test samples to determine their labels with majority voting technique. By performing experiments on four different public cancer data sets, including multiclass data sets, we have found that the test accuracies of MVGPC are better than those of other methods, including AdaBoost with GP. Moreover, some of the more frequently occurring genes in the classification rules are known to be associated with the types of cancers being studied in this paper.
Classifier design and evaluation, data mining, feature extraction, evolutionary computing and genetic algorithm, gene expression, majority voting.
H. Iba and T. K. Paul, "Prediction of Cancer Class with Majority Voting Genetic Programming Classifier Using Gene Expression Data," in IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 6, no. , pp. 353-367, 2007.