Methods for Identifying SNP Interactions: A Review on Variations of Logic Regression, Random Forest and Bayesian Logistic Regression
Issue No. 06 - November/December (2011 vol. 8)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2011.46
C. C. M. Chen , Discipline of Math. Sci., Queensland Univ. of Technol., Brisbane, QLD, Australia
H. Schwender , Dept. of Biostat., Johns Hopkins Univ., Baltimore, MD, USA
J. Keith , Sch. of Math. Sci., Monash Univ., Clayton, VIC, Australia
R. Nunkesser , Dept. of Comput. Sci., Tech. Univ. Dortmund, Dortmund, Germany
K. Mengersen , Discipline of Math. Sci., Queensland Univ. of Technol., Brisbane, QLD, Australia
P. Macrossan , Discipline of Math. Sci., Queensland Univ. of Technol., Brisbane, QLD, Australia
Due to advancements in computational ability, enhanced technology and a reduction in the price of genotyping, more data are being generated for understanding genetic associations with diseases and disorders. However, with the availability of large data sets comes the inherent challenges of new methods of statistical analysis and modeling. Considering a complex phenotype may be the effect of a combination of multiple loci, various statistical methods have been developed for identifying genetic epistasis effects. Among these methods, logic regression (LR) is an intriguing approach incorporating tree-like structures. Various methods have built on the original LR to improve different aspects of the model. In this study, we review four variations of LR, namely Logic Feature Selection, Monte Carlo Logic Regression, Genetic Programming for Association Studies, and Modified Logic Regression-Gene Expression Programming, and investigate the performance of each method using simulated and real genotype data. We contrast these with another tree-like approach, namely Random Forests, and a Bayesian logistic regression with stochastic search variable selection.
Monte Carlo methods, belief networks, genetics, genomics, medical computing, molecular biophysics, molecular configurations, single nucleotide polymorphism, SNP interactions, logic regression, random forest, Bayesian logistic regression, tree-like structures, logic feature selection, Monte Carlo logic regression, Genetic Programming for Association Studies, modified logic regression-gene expression programming, real genotype data, random forests, stochastic search variable selection, Regression analysis, Mathematical model, Bayesian methods, Genetic programming, Monte Carlo methods, candidate gene search., Logic regressions, Genetic Programming for Association Studies, Modified Logic Regression-Gene Expression Programming, Random Forest, Bayesian logistic regression with stochastic search algorithm
K. Mengersen, R. Nunkesser, J. Keith, H. Schwender, C. C. Chen and P. Macrossan, "Methods for Identifying SNP Interactions: A Review on Variations of Logic Regression, Random Forest and Bayesian Logistic Regression," in IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 8, no. , pp. 1580-1591, 2011.