This Article 
 Bibliographic References 
 Add to: 
Methods for Identifying SNP Interactions: A Review on Variations of Logic Regression, Random Forest and Bayesian Logistic Regression
November/December 2011 (vol. 8 no. 6)
pp. 1580-1591
C. C. M. Chen, Discipline of Math. Sci., Queensland Univ. of Technol., Brisbane, QLD, Australia
H. Schwender, Dept. of Biostat., Johns Hopkins Univ., Baltimore, MD, USA
J. Keith, Sch. of Math. Sci., Monash Univ., Clayton, VIC, Australia
R. Nunkesser, Dept. of Comput. Sci., Tech. Univ. Dortmund, Dortmund, Germany
K. Mengersen, Discipline of Math. Sci., Queensland Univ. of Technol., Brisbane, QLD, Australia
P. Macrossan, Discipline of Math. Sci., Queensland Univ. of Technol., Brisbane, QLD, Australia
Due to advancements in computational ability, enhanced technology and a reduction in the price of genotyping, more data are being generated for understanding genetic associations with diseases and disorders. However, with the availability of large data sets comes the inherent challenges of new methods of statistical analysis and modeling. Considering a complex phenotype may be the effect of a combination of multiple loci, various statistical methods have been developed for identifying genetic epistasis effects. Among these methods, logic regression (LR) is an intriguing approach incorporating tree-like structures. Various methods have built on the original LR to improve different aspects of the model. In this study, we review four variations of LR, namely Logic Feature Selection, Monte Carlo Logic Regression, Genetic Programming for Association Studies, and Modified Logic Regression-Gene Expression Programming, and investigate the performance of each method using simulated and real genotype data. We contrast these with another tree-like approach, namely Random Forests, and a Bayesian logistic regression with stochastic search variable selection.

[1] A.S. Andrew, M.R. Karagas, H.H. Nelson, S. Guarrera, S. Polidoro, S. Gamberini, C. Sacerdote, J.H. Moore, K.T. Kelsey, and E. Demidenko, “DNA Repair Polymorphisms Modify Bladder Cancer Risk: A Multi-Factor Analytic Strategy,” Human Heredity, vol. 65, no. 2, pp. 105-118, 2008.
[2] B. Atik, T.A. Skwor, R.P. Kandel, B. Sharma, H.K. Adhikari, L. Steiner, H. Erlich, and D. Dean, “Identification of Novel Single Nucleotide Polymorphisms in Inflammatory Genes as Risk Factors Associated with Trachomatous Trichiasis,” PLoS ONE, vol. 3, no. 10, pp. e3600, 2008.
[3] L. Breiman, “Bagging Predictors,” J. Machine Learning, vol. 24, no. 2, pp. 123-140, 1996.
[4] L. Breiman, “Random Forests,” Machine Learning, vol. 45, no. 1, pp. 5-32, 2001.
[5] L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone, Classification and Regression Trees. Chapman and Hall CRC, 1984.
[6] C.C.-M. Chen, K. Mengersen, and J.M. Keith (In prep), “, Bayesian Method for Genome-Wide Association Studies: Review and Illustration,”
[7] D.V. Conti and W.J. Gauderman, “Snps, Haplotypes, and Model Selection in a Candidate Gene Region: The Simple Analysis for Multilocus Data,” Genetic Epidemiology, vol. 27, no. 4, pp. 429-441, 2004.
[8] H.J. Cordell, “Epistasis: What it Means, What it Doesn't Mean, and Statistical Methods to Detect it in Humans,” Human Molecular Genetics, vol. 11, no. 20, pp. 2463-2468, 2002.
[9] H.J. Cordell, “Detecting Gene-Gene Interactions that Underlie Human Diseases,” Nature Rev. Genetics, vol. 10, no. 6, pp. 392-404, 2009.
[10] C. Ferreira, “Gene Expression Programming: A New Adaptive Algorithm for Solving Problems,” Arxiv Preprint cs.AI/0102027, 2001.
[11] B.L. Fridley, “Bayesian Variable and Model Selection Methods for Genetic Association Studies,” Genetic Epidemiology, vol. 33, no. 1, pp. 27-37, 2009.
[12] A. Fritsch and K. Ickstadt, “Comparing Logic Regression Based Methods for Identifying SNP Interactions,” Lecture Notes in Computer Science, vol. 4414, pp. 90-103, 2007.
[13] E.I. George and R.E. McCulloch, “Variable Selection via Gibbs Sampling,” J. Am. Statistical Assoc., vol. 88, no. 423, pp. 881-889, 1993.
[14] E.I. George and R.E. McCulloch, “Approaches for Bayesian Variable Selection,” Statistica Sinica, vol. 7, pp. 339-374, 1997.
[15] P.J. Green, “Reversible Jump Markov Chain Monte Carlo Computation and Bayesian Model Determination,” Biometrika, vol. 82, no. 4, pp. 711-732, 1995.
[16] A.G. Heidema, J.M.A. Boer, N. Nagelkerke, and E.C.M. Mariman, “The Challenge for Genetic Epidemiologists: How to Analyze Large Numbers of SNPS in Relation to Complex Diseases,” BMC Genetics, vol. 7, no. 23, 2006, doi: 10.1186/147/2156-7-23.
[17] J. Hoh, A. Wille, and J. Ott, “Trimming, Weighting, and Grouping SNPS in Human Case-Control Association Studies,” Genome Research, vol. 11, no. 12, pp. 2115-2119, 2001.
[18] J.H. Holland, Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control and Artificial Intelligence. Univ. of Michigan Press, 1975.
[19] R. Jiang, W. Tang, X. Wu, and W. Fu, “A Random Forest Approach to the Detection of Epistatic Interactions in Case-Control Studies,” BMC Bioinformatics, vol. 10, Suppl 1, pp. S65, 2009.
[20] C. Justenhoven, U. Hamann, B. Pesch, V. Harth, S. Rabstein, C. Baisch, C. Vollmert, T. Illig, Y.-D. Ko, T. Bruning, and H. Brauch, “For the Interdisciplinary Study Group on Gene Environment Interactions, and N. Breast Cancer in Germany,” Cancer Epidemiol Biomarkers Prev, vol. 13, no. 12, pp. 2059-2064, 2004.
[21] S. Kirkpatrick, C.D. Gelatt, and M.P. Vecchi, “Optimization by Simulated Annealing,” Science, vol. 220, no. 4598, pp. 671-680, 1983.
[22] C. Kooperberg, J.C. Bis, K.D. Marciante, S.R. Heckbert, T. Lumley, and B.M. Psaty, “Logic Regression for Analysis of the Association between Genetic Variation in the Renin-Angiotensin System and Myocardial Infarction or Stroke,” Am. J. Epidemiology, vol. 165, no. 3, pp. 334-343, 2007.
[23] C. Kooperberg and I. Ruczinski, “Identifying Interacting SNPS Using Monte Carlo Logic Regression,” Genetic Epidemiology, vol. 28, no. 2, pp. 157-170, 2005.
[24] J.R. Koza and J.P. Rice, Genetic Programming. Springer, 1992.
[25] E.S. Lander, L.M. Linton, B. Birren, C. Nusbaum, M.C. Zody, J. Baldwin, K. Devon, K. Dewar, M. Doyle, and W. FitzHugh, “Initial Sequencing and Analysis of the Human Genome,” Nature, vol. 409, no. 6822, pp. 860-921, 2001.
[26] K. Lunetta, L.B. Hayward, J. Segal, and P. Van Eerdewegh, “Screening Large-Scale Association Study Data: Exploiting Interactions Using Random Forests,” BMC Genetics, vol. 5, no. 1, p. 32, 2004.
[27] P. Macrossan, C.C.-M. Chen, and K.L. Mengersen (In prep)., “Using Gene Expression Programming with Modified Logic Regression for the Investigation of SNP Interactions in Large Dimensional Data,” In Prep.,
[28] P. McCullagh, and J.A. Nelder, Generalized Linear Models. Chapman and Hall, 1983.
[29] L.E. Mechanic, B.T. Luke, J.E. Goodman, S.J. Chanock, and C.C. Harris, “Polymorphism Interaction Analysis (PIA): A Method for Investigating Complex Gene-Gene Interactions,” BMC Bioinformatics, vol. 9, no. 1, p. 146, 2008.
[30] Y. Meng, Q. Yang, K.T. Cuenco, L.A. Cupples, A.L. DeStefano, and K.L. Lunette, “Two-Stage Approach for Identifying Single-Nucleotide Polymorphisms Associated with Rheumatoid Arthritis Using Random Forests and Bayesian Networks,” BMC, vol. 1, Suppl 1, pp. S56, 2007.
[31] R.M. Neal, “Markov Chain Monte Carlo Methods Based on “Slicing” the Density Function,” Technical Report No. 9722, Dept. of Statistics, Univ. of Toronto, 1997.
[32] R. Nunkesser, T. Bernholt, H. Schwender, K. Ickstadt, and I. Wegener, “Detecting High-Order Interactions of Single Nucleotide Polymorphisms Using Genetic Programming,” Bioinformatics, vol. 23, no. 24, pp. 3280-3288, 2007.
[33] P.C. Phillips, “Epistasisthe Essential Role of Gene Interactions in the Structure and Evolution of Genetic Systems,” Nature Rev. Genetics, vol. 9, no. 11, pp. 855-867, 2008.
[34] “R Development Core Team 2008,” R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, 2008.
[35] T.K. Rice, N.J. Schork, and D.C. Rao, “Methods for Handling Multiple Testing,” Genetic Dissection of Complex Traits, D.C. Rao and C.C. Gu, eds., Academic Press, 2008.
[36] M.D. Ritchie, L.W. Hahn, N. Roodi, L.R. Bailey, W.D. Dupont, F.F. Parl, and J.H. Moore, “Multifactor-Dimensionality Reduction Reveals High-Order Interactions among Estrogen-Metabolism Genes in Sporadic Breast Cancer,” Am. J. Human Genetics, vol. 69, no. 1, pp. 138-147, 2001.
[37] I. Ruczinski, C. Kooperberg, and M. Leblanc, “Logic Regression,” J. Computational and Graphical Statistics, vol. 12, no. 3, pp. 475-511, 2003.
[38] H. Schwender, “Statistical Analysis of Genotype and Gene Expression Data,” PhD thesis, Dept. of Statistics, TU Dortmund Univ., 2007.
[39] H. Schwender and K. Ickstadt, “Identification of SNP Interactions Using Logic Regression,” Biostatistics, vol. 8, no. 1, pp. 187-198, 2008.
[40] H. Schwender and K. Ickstadt, “Imputing Missing Genotypes with K Nearest Neighbors,” technical report, Collaborative Research Center 475, Dept. of Statistics, Univ. of Dortmund, 2008.
[41] “The International HapMap Consortium the International Hapmap Project,” Nature, vol. 426, pp. 789-796, 2003.
[42] N. Yi, V. George, and D.B. Allison, “Stochastic Search Variable Selection for Identifying Multiple Quantitative Trait Loci,” Genetics, vol. 164, pp. 1129-1138, 2003.
[43] L.J. Zhao, X.G. Liu, Y.Z. Liu, Y.J. Liu, C.J. Papasian, B.Y. Sha, F. Pan, Y.F. Guo, L. Wang, and H. Yan, “Genome-Wide Association Study for Femoral Neck Bone Geometry,” J. Bone and Mineral Research, vol. 0, pp. 1-34, 2009.
[44] Y. Zhang and J.S. Liu, “Bayesian Inference of Epistatic Interactions in Case-Control Studies,” Nature Genetics, vol. 39, no. 9, pp. 1167-1173, 2007.

Index Terms:
Monte Carlo methods,belief networks,genetics,genomics,medical computing,molecular biophysics,molecular configurations,single nucleotide polymorphism,SNP interactions,logic regression,random forest,Bayesian logistic regression,tree-like structures,logic feature selection,Monte Carlo logic regression,Genetic Programming for Association Studies,modified logic regression-gene expression programming,real genotype data,random forests,stochastic search variable selection,Regression analysis,Mathematical model,Bayesian methods,Genetic programming,Monte Carlo methods,candidate gene search.,Logic regressions,Genetic Programming for Association Studies,Modified Logic Regression-Gene Expression Programming,Random Forest,Bayesian logistic regression with stochastic search algorithm
C. C. M. Chen, H. Schwender, J. Keith, R. Nunkesser, K. Mengersen, P. Macrossan, "Methods for Identifying SNP Interactions: A Review on Variations of Logic Regression, Random Forest and Bayesian Logistic Regression," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 8, no. 6, pp. 1580-1591, Nov.-Dec. 2011, doi:10.1109/TCBB.2011.46
Usage of this product signifies your acceptance of the Terms of Use.