Issue No. 02 - April-June (2004 vol. 1)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2004.23
We consider the following problem: Given a set of binary sequences, determine lower bounds on the minimum number of recombinations required to explain the history of the sample, under the infinite-sites model of mutation. The problem has implications for finding recombination hotspots and for the Ancestral Recombination Graph reconstruction problem . Hudson and Kaplan  gave a lower bound based on the four-gamete test. In practice, their bound R_m often greatly underestimates the minimum number of recombinations. The problem was recently revisited by Myers and Griffiths , who introduced two new lower bounds R_h and R_s which are provably better, and also yield good bounds in practice. However, the worst-case complexities of their procedures for computing R_h and R_s are exponential and super-exponential, respectively. In this paper, we show that the number of nontrivial connected components, R_c, in the conflict graph  for a given set of sequences, computable in time O(nm^2), is also a lower bound on the minimum number of recombination events. We show that in many cases, R_c is a better bound than R_h. The conflict graph was used by Gusfield et al.  to obtain a polynomial time algorithm for the galled tree problem, which is a special case of the Ancestral Recombination Graph (ARG) reconstruction problem. Our results also offer some insight into the structural properties of this graph and are of interest for the general Ancestral Recombination Graph reconstruction problem.
Recombination, phylogenetic networks, ancestral recombination graph, haplotypes, lower bounds, conflict graph, NP-completeness.
V. Bafna and V. Bansal, "The Number of Recombination Events in a Sample History: Conflict Graph and Lower Bounds," in IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 1, no. , pp. 78-90, 2004.