This Article 
 Bibliographic References 
 Add to: 
Counting All Possible Ancestral Configurations of Sample Sequences in Population Genetics
July-September 2006 (vol. 3 no. 3)
pp. 239-251
Given a set D of input sequences, a genealogy for D can be constructed backward in time using such evolutionary events as mutation, coalescent, and recombination. An ancestral configuration (AC) can be regarded as the multiset of all sequences present at a particular point in time in a possible genealogy for D. The complexity of computing the likelihood of observing D depends heavily on the total number of distinct ACs of D and, therefore, it is of interest to estimate that number. For D consisting of binary sequences of finite length, we consider the problem of enumerating exactly all distinct ACs. We assume that the root sequence type is known and that the mutation process is governed by the infinite-sites model. When there is no recombination, we construct a general method of obtaining closed-form formulas for the total number of ACs. The enumeration problem becomes much more complicated when recombination is involved. In that case, we devise a method of enumeration based on counting contingency tables and construct a dynamic programming algorithm for the approach. Last, we describe a method of counting the number of ACs that can appear in genealogies with less than or equal to a given number R of recombinations. Of particular interest is the case in which R is close to the minimum number of recombinations for D.

[1] S.N. Ethier and R.C. Griffiths, “The Infinitely-Many-Sites Model as a Measure Valued Diffusion,” Annals of Probability, vol. 15, pp. 515-545, 1987.
[2] S.N. Ethier and R.C. Griffiths, “On the Two-Locus Sampling Distribution,” J. Math. Biology, vol. 29, pp. 131-159, 1990.
[3] P. Fearnhead and P. Donnelly, “Estimating Recombination Rates from Population Genetic Data,” Genetics, vol. 159, pp. 1299-1318, 2001.
[4] R.C. Griffiths, “Genealogical-Tree Probabilities in the Infinitely-Many-Site Mode,” J. Math. Biology, vol. 27, pp. 667-680, 1989.
[5] R.C. Griffiths and P. Marjoram, “Ancestral Inference from Samples of DNA Sequences with Recombination,” J. Computational Biology, vol. 3, pp. 479-502, 1996.
[6] R.C. Griffiths and S. Tavaré, “Ancestral Inference in Population Genetics,” Statistics in Science, vol. 9, pp. 307-319, 1994.
[7] R.C. Griffiths and S. Tavaré, “Simulating Probability Distributions in the Coalescent,” Theoretical Population Biology, vol. 46, pp. 131-159, 1994.
[8] D. Gusfield, “Efficient Algorithms for Inferring Evolutionary Trees,” Networks, vol. 21, pp. 19-28, 1991.
[9] J.F.C. Kingman, “The Coalescent,” Stochastic Processing Applications, vol. 13, pp. 235-248, 1982.
[10] J.F.C. Kingman, “On the Genealogy of Large Populations,” J. Applied Probability, vol. 19A, pp. 27-43, 1982.
[11] M.K. Kuhner, J. Yamato, and J. Felsenstein, “Estimating Effective Population Size and Mutation Rate from Sequence Data Using Metropolis-Hastings Sampling,” Genetics, vol. 140, pp. 1421-1430, 1995.
[12] M.K. Kuhner, J. Yamato, and J. Felsenstein, “Maximum Likelihood Estimation of Recombination Rates from Population Data,” Genetics, vol. 156, pp. 1393-1401, 2000.
[13] J. De Loera, R. Hemmecke, J. Tauzer, and R. Yoshida, “Effective Lattice Point Counting in Rational Convex Polytopes,” J. Symbolic Computation, vol. 38, pp. 1273-1302, 2004.
[14] R. Lyngsø, Y.S. Song, and J. Hein, “Minimum Recombination Histories by Branch and Bound,” Proc. 2005 Workshop Algorithms in Bioinformatics, pp. 239-250, 2005.
[15] K.L. Simonsen and G.A. Churchill, “A Markov Chain Model of Coalescence with Recombination,” Theoretical Population Biology, vol. 52, pp. 43-59, 1997.
[16] M. Stephens and P. Donnelly, “Inference in Molecular Population Genetics,” J. Royal Statistical Soc. Series B, vol. 62, pp. 605-655, 2000.
[17] R.H. Ward, B.L. Frazier, K. Dew, and S. Pääbo, “Extensive Mitochondria Diversity within a Single Amerindian Tribe,” Proc. Nat'l Academy Science, vol. 88, pp. 8720-8724, 1991.

Index Terms:
Ancestral configurations, coalescent, recombination, contingency table, enumeration.
Yun S. Song, Rune Lyngs?, Jotun Hein, "Counting All Possible Ancestral Configurations of Sample Sequences in Population Genetics," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 3, no. 3, pp. 239-251, July-Sept. 2006, doi:10.1109/TCBB.2006.31
Usage of this product signifies your acceptance of the Terms of Use.