ISSN: 1545-5963

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2013.109

Yujin Chung , University of Wisconsin, Madison

Nicole T. Perna , University of Wisconsin, Madison

Cécile Ané , University of Wisconsin, Madison

ABSTRACT

Ancestral recombination events can cause the underlying genealogy of a site to vary along the genome. We consider Bayesian models to simultaneously detect recombination breakpoints in very long sequence alignments and estimate the phylogenetic tree of each block between breakpoints. The models we consider use a dissimilarity measure between trees in their prior distribution to favor similar trees at neighboring loci. We show empirical evidence in Enterobacteria that neighboring genomic regions have similar trees. The main hurdle to using such models is the need to properly calculate the normalizing function for the prior probabilities on trees. In this work, we quantify the impact of approximating this normalizing function as done in biomc2, a hierarchical Bayesian method to detect recombination based on distance between tree topologies. We then derive an algorithm to calculate the normalizing function exactly, for a Gibbs distribution based on the Robinson-Foulds (RF) distance between gene trees at neighboring loci. At the core is the calculation of the joint distribution of the shape of a random tree and its RF distance to a fixed tree. We also propose fast approximations to the normalizing function, which are shown to be very accurate with little impact on the Bayesian inference.

INDEX TERMS

Topology, Bioinformatics, Genomics, Phylogeny, Radio frequency, Vegetation, Biological system modeling, gene tree discordance, phylogenetic tree, recombination, Robinson-Foulds distance, normalizing funcction

CITATION

C. Ané, N. T. Perna and Y. Chung, "Computing the Joint Distribution of Tree Shape and Tree Distance for Gene Tree Inference and Recombination Detection," in

*IEEE/ACM Transactions on Computational Biology and Bioinformatics*.

doi:10.1109/TCBB.2013.109

CITATIONS