This Article 
 Bibliographic References 
 Add to: 
A Consensus Tree Approach for Reconstructing Human Evolutionary History and Detecting Population Substructure
July/August 2011 (vol. 8 no. 4)
pp. 918-928
Ming-Chi Tsai, Joint CMU-Pitt PhD Program in Computational Biology, Pittsburgh
Guy Blelloch, Carnegie Mellon University, Pittsburgh
R. Ravi, Carnegie-Mellon University, Pittsburgh
Russell Schwartz, Carnegie Mellon University, Pittsburgh
The random accumulation of variations in the human genome over time implicitly encodes a history of how human populations have arisen, dispersed, and intermixed since we emerged as a species. Reconstructing that history is a challenging computational and statistical problem but has important applications both to basic research and to the discovery of genotype-phenotype correlations. We present a novel approach to inferring human evolutionary history from genetic variation data. We use the idea of consensus trees, a technique generally used to reconcile species trees from divergent gene trees, adapting it to the problem of finding robust relationships within a set of intraspecies phylogenies derived from local regions of the genome. Validation on both simulated and real data shows the method to be effective in recapitulating known true structure of the data closely matching our best current understanding of human evolutionary history. Additional comparison with results of leading methods for the problem of population substructure assignment verifies that our method provides comparable accuracy in identifying meaningful population subgroups in addition to inferring relationships among them. The consensus tree approach thus provides a promising new model for the robust inference of substructure and ancestry from large-scale genetic variation data.

[1] F.S. Collins, M. Morgan, and A. Patrinos, “The Human Genome Project: Lessons from Large-Scale Biology,” Science, vol. 300, no. 5617, pp. 286-290, Apr. 2003.
[2] J. Venter et al., “The Sequence of the Human Genome,” Science, vol. 291, no. 5507, pp. 1304-1351, 2001.
[3] S.T. Sherry, M.H. Ward, M. Kholodov, J. Baker, L. Phan, E.M. Smigielski, and K. Sirotkin, “Dbsnp: The Ncbi Database of Genetic Variation,” Nucleic Acids Research, vol. 29, no. 1, pp. 308-311, 2001.
[4] K.A. Frazer et al., “A Second Generation Human Haplotype Map of over 3.1 Million Snps,” Nature, vol. 449, no. 7164, pp. 851-861, Oct. 2007.
[5] M. Jakobsson, S. Scholz, P. Scheet, R. Gibbs, J. Vanliere, H. Fung, Z. Szpiech, J. Degnan, K. Wang, R. Guerreiro, J. Bras, J. Schymick, D. Hernandez, B. Traynor, J. Simon-Sanchez, M. Matarin, A. Britton, J. van de Leemput, I. Rafferty, M. Bucan, H. Cann, J. Hardy, N. Rosenberg, and A. Singleton, “Genotype, Haplotype and Copy-Number Variation in Worldwide Human Populations,” Nature, vol. 451, no. 7181, pp. 998-1003, Feb. 2008.
[6] D. Behar, S. Rosset, J. Blue-Smith, O. Balanovsky, S. Tzur, D. Comas, R. Mitchell, L. Quintana-Murci, C. Tyler-Smith, and R. Wells, “The Genographic Project Public Participation Mitochondrial DNA Database,” PLoS Genetics, vol. 3, no. 6, p. e104, 2007.
[7] M. Nelson, K. Bryc, K. King, A. Indap, A. Boyko, J. Novembre, L. Briley, Y. Maruyama, D. Waterworth, G. Waeber, P. Vollenweider, J. Oksenberg, S. Hauser, H. Stirnadel, J. Kooner, J. Chambers, B. Jones, V. Mooser, C. Bustamante, A. Roses, D. Burns, M. Ehm, and E. Lai, “The Population Reference Sample, Popres: A Resource for Population, Disease, and Pharmacological Genetics Research,” Am. J. Human Genetics, vol. 83, no. 3, pp. 347-358, 2008.
[8] D. Thomas and J. Witte, “Point: Population Stratification: A Problem for Case-Control Studies of Candidate-Gene Associations?,” Cancer Epidemiology Biomarkers and Prevention, vol. 11, no. 6, pp. 505-512, 2002.
[9] J. Pritchard, M. Stephens, and P. Donnelly, “Inference of Population Structure Using Multilocus Genotype Data,” Genetics, vol. 155, no. 2, pp. 945-959, June 2000.
[10] N. Patterson, A. Price, and D. Reich, “Population Structure and Eigenanalysis,” PLoS Genetics, vol. 2, no. 12, pp. e190+, Dec. 2006.
[11] K. Sohn and E. Xing, “Spectrum: Joint Bayesian Inference of Population Structure and Recombination Events,” Bioinformatics, vol. 23, no. 13, pp. i479-i489, 2007.
[12] S. Shringarpure and E. Xing, “Mstruct: Inference of Population Structure in Light of both Genetic Admixing and Allele Mutations,” Genetics, vol. 108, pp. 575-593, 2009.
[13] M. Nei and A. Roychoudhury, “Genetic Relationship and Evolution of Human Races,” Evolutionary Biology, vol. 14, pp. 1-59, 1982.
[14] L. Jorde, M. Bamshad, W. Watkins, R. Zenger, A.E. Fraley, P. Krakowiak, K. Carpenter, H. Soodyall, T. Jenkins, and A. Rogers, “Origins and Affinities of Modern Humans: A Comparison of Mitochondrial and Nuclear Genetic Data,” Am. J. Human Genetics, vol. 57, pp. 523-538, 1995.
[15] R. Cann, M. Stoneking, and A. Wilson, “Mitochondrial DNA and Human Evolution,” Nature, vol. 325, no. 6099, pp. 31-36, 1987.
[16] S.A. Tishkoff, E. Dietzsch, W. Speed, A.J. Pakstis, J.R. Kidd, K. Cheung, B. Bonn-Tamir, A.S. Santachiara-Benerecetti, P. Moral, M. Krings, S. Pbo, E. Watson, N. Risch, T. Jenkins, and K.K. Kidd, “Global Patterns of Linkage Disequilibrium at the CD4 Locus and Modern Human Origins,” Science, vol. 271, no. 5254, pp. 1380-1387, 1996.
[17] M.F. Hammer, A.B. Spurdle, T. Karafet, M.R. Bonner, E.T. Wood, A. Novelletto, P. Malaspina, R.J. Mitchell, S. Horai, T. Jenkins, and S.L. Zegura, “The Geographic Distribution of Human Y Chromosome Variation,” Genetics, vol. 145, no. 3, pp. 787-805, 1997.
[18] M. Nei and S. Kumar, Molecular Evolution and Phylogenetics. Oxford Univ. Press, 2000.
[19] S. Sridhar, F. Lam, G. Blelloch, R. Ravi, and R. Schwartz, “Direct Maximum Parsimony Phylogeny Reconstruction from Genotype Data,” BMC Bioinformatics, vol. 8, no. 1, p. 472, 2007.
[20] T. Margush and F. Mcmorris, “Consensus N-Trees,” Bull. Math. Biology, vol. 43, pp. 239-244, 1981.
[21] E. Adams, “N-Trees as Nestings: Complexity, Similarity, and Consensus,” J. Classification, vol. 3, no. 2, pp. 299-317, 1986.
[22] P. Grünwald, I. Myung, and M. Pitt, Advances in Minimum Description Length: Theory and Applications. The MIT Press, 2005.
[23] Y.J. Chu and T.H. Liu, “On the Shortest Arborescence of a Directed Graph,” Science Sinica, vol. 14, pp. 1396-1400, 1965.
[24] R.R. Hudson, “Generating Samples under a Wright-Fisher Neutral Model of Genetic Variation,” Bioinformatics, vol. 18, no. 2, pp. 337-338, Feb. 2002.
[25] A. Rambaut and N.C. Grass, “Seq-Gen: An Application for the Monte Carlo Simulation of DNA Sequence Evolution Along Phylogenetic Trees,” Computer Applications in Biosciences, vol. 13, no. 3, pp. 235-238, 1997.
[26] M. Shriver and R. Kittles, “Genetic Ancestry and the Search for Personalized Genetic Histories,” Nature Rev. Genetics, vol. 5, pp. 611-618, 2004.
[27] M. Meila, “Comparing Clusterings—An Information Based Distance,” J. Multivariate Analysis, vol. 98, no. 5, pp. 873-895, 2007.
[28] M. Kayser, M. Krawczak, L. Excoffier, P. Dieltjes, D. Corach, V. Pascali, C. Gehrig, L. Bernini, J. Jespersen, E. Bakker, L. Roewer, and P. de Knijff, “An Extensive Analysis of Y-Chromosomal Microsatellite Haplotypes in Globally Dispersed Human Populations,” Am. J. Human Genetics, vol. 68, no. 4, pp. 990-1018, 2001.
[29] S. Tishkoff and B. Verrelli, “Patterns of Human Genetic Diversity: Implications for Human Evolutionary History and Disease,” Ann. Rev. Genomics and Human Genetics, vol. 4, no. 1, pp. 293-340, 2003.
[30] D. Reich, K. Thangaraj, N. Patterson, A. Price, and L. Singh, “Reconstructing Indian Population History,” Nature, vol. 461, no. 7263, pp. 489-494, 2009.
[31] S. Tishkoff and S. Williams, “Genetic Analysis of African Populations: Human Evolution and Complex Disease,” Nature Rev. Genetics, vol. 3, no. 8, pp. 611-621, 2002.
[32] M. He, J. Gitschier, T. Zerjal, P. de Knijff, C. Tyler-Smith, and Y. Xue, “Geographical Affinities of the HapMap Samples,” PLoS ONE, vol. 4, no. 3, p. e4684, 2009.
[33] D.E. Reich and D.B. Goldstein, “Genetic Evidence for a Paleolithic Human Population Expansion in Africa,” Proc. Nat'l Academy of Sciences USA, vol. 95, no. 14, pp. 8119-8123, 1998.
[34] L. Jin, M.L. Baskett, L.L. Cavalli-Sforz, L.A. Zhivotovsky, M.W. Feldman, and N.A. Rosenberg, “Microsatellite Evolution in Modern Humans: A Comparison of Two Data Sets from the Same Populations,” Annals of Human Genetics, vol. 64, no. 02, pp. 117-134, 2000.
[35] L.A. Zhivotovsky, “Estimating Divergence Time with the Use of Microsatellite Genetic Distances: Impacts of Population Growth and Gene Flow,” Molecular Biology and Evolution, vol. 18, no. 5, pp. 700-709, 2001.
[36] D. Gusfield, “Optimal, Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination,” J. Computer and System Sciences, vol. 70, no. 3, pp. 381-398, 2005.

Index Terms:
Biology and genetics, trees, information theory, graph algorithms.
Ming-Chi Tsai, Guy Blelloch, R. Ravi, Russell Schwartz, "A Consensus Tree Approach for Reconstructing Human Evolutionary History and Detecting Population Substructure," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 8, no. 4, pp. 918-928, July-Aug. 2011, doi:10.1109/TCBB.2011.23
Usage of this product signifies your acceptance of the Terms of Use.