This Article 
 Bibliographic References 
 Add to: 
TreeDT: Tree Pattern Mining for Gene Mapping
April-June 2006 (vol. 3 no. 2)
pp. 174-185
We describe TreeDT, a novel association-based gene mapping method. Given a set of disease-associated haplotypes and a set of control haplotypes, TreeDT predicts likely locations of a disease susceptibility gene. TreeDT extracts, essentially in the form of haplotype trees, information about historical recombinations in the population: A haplotype tree constructed at a given chromosomal location is an estimate of the genealogy of the haplotypes. TreeDT constructs these trees for all locations on the given haplotypes and performs a novel disequilibrium test on each tree: Is there a small set of subtrees with relatively high proportions of disease-associated chromosomes, suggesting shared genetic history for those and a likely disease gene location? We give a detailed description of TreeDT and the tree disequilibrium tests, we analyze the algorithm formally, and we evaluate its performance experimentally on both simulated and real data sets. Experimental results demonstrate that TreeDT has high accuracy on difficult mapping tasks and comparisons to other methods (EATDT, HPM, TDT) show that TreeDT is very competitive.

[1] P. Sevon, H.T.T. Toivonen, and V. Ollikainen, “TreeDT: Gene Mapping by Tree Disequilibrium Test,” Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 365-370, 2001.
[2] R. Miller, Simultaneous Statistical Inference. New York: McGraw-Hill, 1966.
[3] P. Westfall and S. Young, Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment. New York: Wiley, 1993.
[4] D. Knuth, The Art of Computer Programming, Volume III— Sorting and Searching. Reading, Mass. : Addison-Wesley, 1975.
[5] B. Devlin, N. Risch, and K. Roeder, “Disequilibrium Mapping: Composite Likelihood for Pairwise Disequilibrium,” Genomics, vol. 36, pp. 1-16, 1996.
[6] L. Lazzeroni, “Linkage Disequilibrium and Gene Mapping: An Empirical Least-Squares Approach,” Am. J. Human Genetics, vol. 62, pp. 159-170, 1998.
[7] M. McPeek and A. Strahs, “Assessment of Linkage Disequilibrium by the Decay of Haplotype Sharing, with Application to Fine-Scale Genetic Mapping,” Am. J. Human Genetics, vol. 65, pp. 858-875, 1999.
[8] S. Service, D. Temple Lang, N. Freimer, and L. Sandkuijl, “Linkage-Disequilibrium Mapping of Disease Genes by Reconstruction of Ancestral Haplotypes in Founder Populations,” Am. J. Human Genetics, vol. 64, pp. 1728-1738, 1999.
[9] J. Terwilliger, “A Powerful Likelihood Method for the Analysis of Linkage Disequilibrium between Trait Loci and One Ore More Polymorphic Marker Loci,” Am. J. Human Genetics, vol. 56, pp. 777-787, 1995.
[10] A. Morris, J. Whittaker, and D. Balding, “Bayesian Fine-Scale Mapping of Disease Loci, by Hidden Markov Models,” Am. J. Human Genetics, vol. 67, pp. 155-169, 2000.
[11] A. Morris, J. Whittaker, and D. Balding, “Fine-Scale Mapping of Disease Loci via Shattered Coalescent Modelling of Genealogies,” Am. J. Human Genetics, vol. 70, pp. 686-707, 2002.
[12] B. Rannala and J. Reeve, “High-Resolution Multipoint Linkage-Disequilibrium Mapping in the Context of Human Sequence,” Am. J. Human Genetics, vol. 69, pp. 159-178, 2001.
[13] J. Lam, K. Roeder, and B. Devlin, “Haplotype Fine-Mapping by Evolutionary Trees,” Am. J. Human Genetics, vol. 66, pp. 659-673, 2000.
[14] R. Spielman, R. McGinnis, and W. Ewens, “Transmission Test for Linkage Disequilibrium: The Insulin Gene Region and Insulin-Dependent Diabetes Mellitus (IDDM),” Am. J. Human Genetics, vol. 52, pp. 506-516, 1993.
[15] L. Kruglyak, M. Daly, M. Reeve-Daly, and E. Lander, “Parametric and Nonparametric Linkage Analysis: A Unified Multipoint Approach,” Am. J. Human Genetics, vol. 58, pp. 1347-1363, 1996.
[16] S. Lin, A. Chakravarti, and D. Cutler, “Exhaustive Allelic Transmission Disequilibrium Tests as a New Approach to Genome-Wide Association Studies,” Nature Genetics, vol. 36, pp. 1181-1188, 2004.
[17] H. Toivonen, P. Onkamo, K. Vasko, V. Ollikainen, P. Sevon, H. Mannila, M. Herr, and J. Kere, “Data Mining Applied to Linkage Disequilibrium Mapping,” Am. J. Human Genetics, vol. 67, pp. 133-145. 2000.
[18] H. Toivonen, P. Onkamo, K. Vasko, V. Ollikainen, P. Sevon, H. Mannila, and J. Kere, “Gene Mapping by Haplotype Pattern Mining,” Proc. IEEE Int'l Symp. Bio-Informatics and Biomedical Eng., pp. 99-108, 2000.
[19] P. Onkamo, V. Ollikainen, P. Sevon, H. Toivonen, H. Mannila, and J. Kere, “Association Analysis for Quantitative Traits by Data Mining: QHPM,” Annals of Human Genetics, vol. 66, pp. 419-429, 2002.
[20] P. Sevon, H. Toivonen, and P. Onkamo, “Gene Mapping by Pattern Discovery,” Data Mining in Bioinformatics, J. Wang, M. Zaki, H. Toivonen, and D. Shasha, eds., Springer, 2005.
[21] D. Qian, “Haplotype Sharing Correlation Analysis Using Family Data: A Comparison with Family-Based Association Test in the Presence of Allelic Heterogeneity,” Genetic Epidemiology, vol. 27, pp. 43-52, 2004.
[22] K. Yu, C. Gu, M. Province, C. Xiong, and D. Rao, “Genetic Association Mapping under Founder Heterogeneity via Weighted Haplotype Similarity Analysis in Candidate Genes,” Genetic Epidemiology, vol. 27, pp. 182-191, 2004.
[23] J. Tseng, “Evolutionary-Based Grouping of Haplotypes in Association Analysis,” Genetic Epidemiology, vol. 28, pp. 220-231, 2005.
[24] Y. Ge, S. Dudoit, and T. Speed, “Resampling-Based Multiple Testing for Microarray Data Analysis,” TEST, vol. 12, pp. 1-77, 2003.
[25] R. Hudson, “Generating Samples under a Wright-Fisher Neutral Model,” Bioinformatics, vol. 18, pp. 337-338, 2002.
[26] S. Lin, A. Chakravarti, and D. Cutler, “Haplotype and Missing Data Inference in Nuclear Families,” Genome Research, vol. 14, pp. 1624-1632, 2004.
[27] S. Bain, J. Todd, and J. Barnett, “The British Diabetic Association-Warren Repository,” Autoimmunity, vol. 7, pp. 83-85, 1990.
[28] D. Qian and L. Beckmann, “Minimum-Recombinant Haplotyping in Pedigrees,” Am. J. Human Genetics, vol. 70, pp. 1434-1445, 2002.
[29] J. Li and T. Jiang, “Efficient Inference of Haplotypes from Genotypes on a Pedigree,” J. Bioinformatics and Computational Biology, vol. 1, pp. 41-69, 2003.
[30] M. Stephens, N. Smith, and P. Donnelly, “A New Statistical Method for Haplotype Reconstruction from Population Data,” Am. J. Human Genetics, vol. 68, pp. 978-989, 2001.
[31] L. Eronen, F. Geerts, and H. Toivonen, “A Markov Chain Approach to Reconstruction of Long Haplotypes,” Proc. Pacific Symp. Biocomputing, pp. 104-115, 2004.

Index Terms:
Biology and genetics, nonparametric statistics, nonnumerical algorithms and problems.
Petteri Sevon, Hannu Toivonen, Vesa Ollikainen, "TreeDT: Tree Pattern Mining for Gene Mapping," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 3, no. 2, pp. 174-185, April-June 2006, doi:10.1109/TCBB.2006.28
Usage of this product signifies your acceptance of the Terms of Use.