This Article 
 Bibliographic References 
 Add to: 
Gene Mapping and Marker Clustering Using Shannon's Mutual Information
January-March 2006 (vol. 3 no. 1)
pp. 47-56
Finding the causal genetic regions underlying complex traits is one of the main aims in human genetics. In the context of complex diseases, which are believed to be controlled by multiple contributing loci of largely unknown effect and position, it is especially important to develop general yet sensitive methods for gene mapping. We discuss the use of Shannon's information theory for population-based gene mapping of discrete and quantitative traits and for marker clustering. Various measures of mutual information were employed in order to develop a comprehensive framework for gene mapping analyses. An algorithm aimed at finding so-called relevance chains of causal markers is proposed. Moreover, entropy measures are used in conjunction with multidimensional scaling to visualize clusters of genetic markers. The relevance chain algorithm successfully detected the two causal regions in a simulated scenario. The approach has also been applied to a published clinical study on autoimmune (Graves') disease. Results were consistent with those of standard statistical methods, but identified an additional locus of interest in the promotor region of the associated gene CTLA4. The developed software is freely available at

[1] D. Botstein and N. Risch, “Discovering Genotype Underlying Human Phenotypes: Past Successes for Mendelian Disease, Future Approaches for Complex Disease,” Nature Genetics, vol. 33 (suppl.), pp. 228-237, Mar. 2003.
[2] L. Cardon and J. Bell, “Association Study Designs for Complex Diseases,” Nature Rev. Genetics, vol. 2, no. 2, pp. 91-99, Feb. 2001.
[3] D. Zaykin, P. Westfall, S. Young, M. Karnoub, M. Wagner, and M. Ehm, “Testing Association of Statistically Inferred Haplotypes with Discrete and Continuous Traits in Samples of Unrelated Individuals,” Human Heredity, vol. 53, no. 2, pp. 79-91, May 2002.
[4] D. Clayton, “Population Association,” Handbook of Statistical Genetics, D. Balding, M. Bishop, and C. Cannings, eds., pp. 519-540, Chichester: John Wiley & Sons, 2001.
[5] C.E. Shannon, “A Mathematical Theory of Communication,” Bell System Technical J., vol. 27, pp. 379-423, 623-656, July-Oct. 1948.
[6] T. Cover and J. Thomas, Elements of Information Theory. New York: John Wiley & Sons, 1991.
[7] A. Butte and I. Kohane, “Mutual Information Relevance Networks: Functional Genomic Clustering Using Pairwise Entropy Measurements,” Proc. Pacific Symp. Biocomputing, pp. 418-429, Jan. 2000.
[8] J. Kasturi, R. Acharya, and M. Ramanathan, “An Information Theoretic Approach for Analyzing Temporal Patterns of Gene Expression,” Bioinformatics, vol. 19, no. 4, pp. 449-458, 2003.
[9] M. Nothnagel, R. Fürst, and K. Rohde, “Entropy as a Measure for Linkage Disequilibrium over Multilocus Haplotype Blocks,” Human Heredity, vol. 54, pp. 186-198, 2003.
[10] J. Hampe, S. Schreiber, and M. Krawczak, “Entropy-Based SNP Selection for Genetic Association Studies,” Human Genetics, vol. 114, no. 1, pp. 36-43, Dec. 2003.
[11] I. Grosse, H. Herzel, S. Buldyrev, and H. Stanley, “Species Independence of Mutual Information in Coding and Noncoding DNA,” Physical Rev. E, vol. 61, no. 5, pp. 5624-5629, 2000.
[12] A. Tsalenko, A. Ben-Dor, N. Cox, and Z. Yakhini, “Methods for Analysis and Visualization of SNP Genotype Data for Complex Diseases,” Proc. Pacific Symp. Biocomputing, vol. 8, pp. 548-561, Jan. 2003.
[13] J. Mueller, E. Bresch, Z. Dawy, T. Bettecken, T. Meitinger, and J. Hagenauer, “Shannon's Mutual Information Applied to Population-Based Gene Mapping,” Am. J. Human Genetics, vol. 73, no. 5 (suppl.), p. 610, Nov. 2003.
[14] B. Goebel, Z. Dawy, J. Hagenauer, and J. Mueller, “An Approximation to the Distribution of Finite Sample Size Mutual Information Estimates,” Proc. IEEE Int'l Conf. Comm., May 2005.
[15] S. Kullback, Information Theory and Statistics. New York: John Wiley & Sons, 1959.
[16] H. Cordell and D. Clayton, “A Unified Stepwise Regression Procedure for Evaluating the Relative Effects of Polymorphisms within a Gene Using Case/Control or Family Data: Application to HLA in Type 1 Diabetes,” Am. J. Human Genetics, vol. 70, no. 1, pp. 124-141, Jan. 2002.
[17] R. Culverhouse, B. Suarez, J. Lin, and T. Reich, “A Perspective on Epistasis: Limits of Models Displaying No Main Effect,” Am. J. Human Genetics, vol. 70, no. 2, pp. 461-471, Feb. 2002.
[18] L. Excoffier and M. Slatkin, “Maximum-Likelihood Estimation of Molecular Haplotype Frequencies in a Diploid Population,” Molecular Biology and Evolution, vol. 12, no. 5, pp. 921-927, Sept. 1995.
[19] M. Stephens, N. Smith, and P. Donnelly, “A New Statistical Method for Haplotype Reconstruction from Population Data,” Am. J. Human Genetics, vol. 68, no. 4, pp. 978-989, Apr. 2001.
[20] D. Fallin and N. Schork, “Accuracy of Haplotype Frequency Estimation for Biallelic Loci, via the Expectation-Maximization Algorithm for Unphased Diploid Genotype Data,” Am. J. Human Genetics, vol. 67, no. 4, pp. 947-959, Oct. 2000.
[21] J. Mueller, E. Lohmussaar, R. Magi, M. Remm, T. Bettecken, P. Lichtner, S. Biskup, T. Illig, A. Pfeufer, J. Luedemann, S. Schreiber, P. Pramstaller, I. Pichler, G. Romeo, A. Gaddi, A. Testa, H. Wichmann, A. Metspalu, and T. Meitinger, “Linkage Disequilibrium Patterns and TagSNP Transferability among European Populations,” Am. J. Human Genetics, vol. 76, no. 3, pp. 387-398, Mar. 2005.
[22] M. Li, X. Chen, X. Li, B. Ma, and P. Vityi, “The Similarity Metric,” Proc. 14th Ann. ACM-SIAM Symp. Discrete Algorithms, pp. 863-872, 2003.
[23] T. Cox and M. Cox, Multidimensional Scaling. London: Chapman & Hall, 1994.
[24] H. Ueda, J. Howson, L. Esposito, J. Heward, H. Snook, G. Chamberlain, D. Rainbow, K. Hunter, A. Smith, G.D. Genova, M. Herr, I. Dahlmand, F. Payne, D. Smyth, C. Lowe, R. Twells, S. Howlett, B. Healy, S. Nutland, H. Rance, V. Everett, L. Smink, A. Lam, H. Cordell, N. Walker, C. Bordin, J. Hulme, C. Motzo, F. Cucca, J. Hess, M. Metzker, J. Rogers, S. Gregory, A. Allahabadia, R. Nithiyananthan, E. Tuomilehto-Wolf, J. Tuomilehto, P. Bingley, K. Gillespie, D. Undlien, K. Ronningen, C. Guja, C. Ionescu-Tirgoviste, D. Savage, A. Maxwell, D. Carson, C. Patterson, J. Franklyn, D. Clayton, L. Peterson, L. Wicker, J. Todd, and S. Gough, “Association of the T-Cell Regulatory Gene CTLA4 with Susceptibility to Autoimmune Disease,” Nature, vol. 423, no. 6939, pp. 506-511, May 2003.
[25] R. Hudson, “Generating Samples under a Wright-Fisher Neutral Model of Genetic Variation,” Bioinformatics, vol. 18, pp. 337-338, Feb. 2002.
[26] M. Nothnagel, “Simulation of LD Block-Structured SNP Haplotype Data and Its Use for the Analysis of Case-Control Data by Supervised Learning Methods,” Am. J. Human Genetics, vol. 71 (suppl.), no. A2363, Oct. 2002.
[27] B. Everitt, The Analysis of Contingency Tables. London: Chapman and Hall, 1977.
[28] H. Sahai and A. Khurshid, Pocket Dictionary of Statistics. McGraw-Hill/Irwin, 2002, keyterm.mhtml.

Index Terms:
Complex traits, genotype-phenotype association, information theory, relevance chains, SNPs.
Zaher Dawy, Bernhard Goebel, Joachim Hagenauer, Christophe Andreoli, Thomas Meitinger, Jakob C. Mueller, "Gene Mapping and Marker Clustering Using Shannon's Mutual Information," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 3, no. 1, pp. 47-56, Jan.-March 2006, doi:10.1109/TCBB.2006.9
Usage of this product signifies your acceptance of the Terms of Use.