Issue No. 03 - July-September (2006 vol. 3)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2006.31
Given a set D of input sequences, a genealogy for D can be constructed backward in time using such evolutionary events as mutation, coalescent, and recombination. An ancestral configuration (AC) can be regarded as the multiset of all sequences present at a particular point in time in a possible genealogy for D. The complexity of computing the likelihood of observing D depends heavily on the total number of distinct ACs of D and, therefore, it is of interest to estimate that number. For D consisting of binary sequences of finite length, we consider the problem of enumerating exactly all distinct ACs. We assume that the root sequence type is known and that the mutation process is governed by the infinite-sites model. When there is no recombination, we construct a general method of obtaining closed-form formulas for the total number of ACs. The enumeration problem becomes much more complicated when recombination is involved. In that case, we devise a method of enumeration based on counting contingency tables and construct a dynamic programming algorithm for the approach. Last, we describe a method of counting the number of ACs that can appear in genealogies with less than or equal to a given number R of recombinations. Of particular interest is the case in which R is close to the minimum number of recombinations for D.
Ancestral configurations, coalescent, recombination, contingency table, enumeration.
R. Lyngs?, Y. S. Song and J. Hein, "Counting All Possible Ancestral Configurations of Sample Sequences in Population Genetics," in IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 3, no. , pp. 239-251, 2006.