CSDL Home IEEE/ACM Transactions on Computational Biology and Bioinformatics 2010 vol.7 Issue No.04 - October-December

Subscribe

Issue No.04 - October-December (2010 vol.7)

pp: 704-718

Sagi Snir , UC Berkeley, Berkeley

Satish Rao , UC Berkeley, Berkeley

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.133

ABSTRACT

Accurate phylogenetic reconstruction methods are currently limited to a maximum of few dozens of taxa. Supertree methods construct a large tree over a large set of taxa, from a set of small trees over overlapping subsets of the complete taxa set. Hence, in order to construct the tree of life over a million and a half different species, the use of a supertree method over the product of accurate methods, is inevitable. Perhaps the simplest version of this task that is still widely applicable, yet quite challenging, is quartet-based reconstruction. This problem lies at the root of many tree reconstruction methods and theoretical as well as experimental results have been reported. Nevertheless, dealing with false, conflicting quartet trees remains problematic. In this paper, we describe an algorithm for constructing a tree from a set of input quartet trees even with a significant fraction of errors. We show empirically that conflicts in the inputs are handled satisfactorily and that it significantly outperforms and outraces the Matrix Representation with Parsimony (MRP) methods that have previously been most successful in dealing with supertrees. Our algorithm is based on a divide and conquer algorithm where our divide step uses a semidefinite programming (SDP) formulation of MaxCut. We remark that this builds on previous work of ours [29] for piecing together trees from rooted triplet trees. The recursion for unrooted quartets, however, is more complicated in that even with completely consistent set of quartet trees the problem is NP-hard, as opposed to the problem for triples where there is a linear time algorithm. This complexity leads to several issues and some solutions of possible independent interest.

INDEX TERMS

Phylogenetic reconstruction, quartets, MaxCut, supertree.

CITATION

Sagi Snir, Satish Rao, "Quartets MaxCut: A Divide and Conquer Quartets Algorithm",

*IEEE/ACM Transactions on Computational Biology and Bioinformatics*, vol.7, no. 4, pp. 704-718, October-December 2010, doi:10.1109/TCBB.2008.133REFERENCES

- [1] A.V. Aho, Y. Sagiv, T.G. Szymanski, and J.D. Ullman, "Inferring a Tree from Lowest Common Ancestors with an Application to the Optimization of Relational Expressions,"
SIAM J. Computing, vol. 10, no. 3, pp. 405-421, 1981.- [2] S. Arora, S. Rao, and U. Vazirani, "Expander Flows, Geometric Embeddings and Graph Partitioning,"
Proc. Symp. Foundations of Computer Science (FOCS), pp. 222-231, 2004.- [3] B.R. Baum, "Combining Trees as a Way of Combining Data Sets for Phylogenetic Inference,"
Taxon, vol. 41, pp. 3-10, 1992.- [4] A. Ben-Dor, B. Chor, D. Graur, R. Ophir, and D. Pelleg, "Constructing Phylogenies from Quartets: Elucidation of Eutherian Superordinal Relationships,"
J. Computational Biology, vol. 5, no. 3, pp. 377-390, 1998, earlier version appeared in Proc. RECOMB, 1998.- [5] V. Berry and O. Gascuel, "Inferring Evolutionary Trees with Strong Combinatorial Evidence,"
Theoretical Computer Science, vol. 240, pp. 271-298, 2001.- [6] M. Cardillo, O.R.P. Bininda Emonds, E. Boakes, and A. Purvis, "A Species-Level Phylogenetic Supertree of Marsupials,"
J. Zoology, vol. 264, no. 1, pp. 11-31, 2004.- [7] M. Casanellas and J. Fernández-Sánchez, "Performance of a New Invariants Method on Homogeneous and Nonhomogeneous Quartet Trees,"
Molecular Biology and Evolution, vol. 24, no. 1, pp. 288-293, 2007.- [8] M. Casanellas, L.D. Garcia, and S. Sullivant, "Catalog of Small Trees,"
Algebraic Statistics for Computational Biology, L. Pachter and B. Sturmfels, eds., chapter 15, pp. 291-305, Cambridge Univ. Press, 2005.- [9] B. Chor, M. Hendy, B. Holland, and D. Penny, "Multiple Maxima of Likelihood in Phylogenetic Trees: An Analytic Approach,"
Molecular Biology and Evolution, vol. 17, no. 10, pp. 1529-1541, 2000, earlier version appeared in Proc. RECOMB, 2000.- [10] B. Chor, A. Khetan, and S. Snir, "Maximum Likelihood Molecular Clock Comb: Analytic Solutions,"
J. Computational Biology, vol. 13, pp. 819-837, 2006, Earlier Version Appeared in Proc. Seventh Ann. Int'l Conf. Research in Computational Molecular Biology (RECOMB '03).- [11] B. Chor and S. Snir, "Molecular Clock Fork Phylogenies: Closed Form Analytic Maximum Likelihood Solutions,"
Systematic Biology, vol. 53, pp. 963-967, Dec. 2004.- [12] C.J. Creevey and J.O. McInerney, "Clann: Investigating Phylogenetic Information through Supertree Analyses,"
Bioinformatics, vol. 21, no. 3, pp. 390-392, 2005.- [13] C. Daskalakis, C. Hill, A. Jaffe, R. Mihaescu, E. Mossel, and S. Rao, "Maximal Accurate Forests from Distance Matrices,"
Proc. 10th Ann. Int'l Conf. Research in Computational Molecular Biology (RECOMB '06), 2006.- [14] P. Erdös, M. Steel, L. Szekely, and T. Warnow, "A Few Logs Suffice to Build (Almost) all Trees (i),"
Random Structures and Algorithms, vol. 14, pp. 153-184, 1999.- [15] P. Erdös, M. Steel, L. Szekely, and T. Warnow, "A Few Logs Suffice to Build (Almost) all Trees (ii),"
Theoretical Computer Science, vol. 221, pp. 77-118, 1999.- [16] O. Eulenstein, D. Chen, J.G. Burleigh, D. Fernández-Baca, and M.J. Sanderson, "Performance of Flip Supertrees with a Heuristic Algorithm,"
Systematic Biology, vol. 53, no. 2, pp. 299-308, 2004.- [17] M.R. Garey and D.S. Johnson,
Computers and Intractability; A Guide to the Theory of NP-Completeness. W.H. Freeman and Company, 1979.- [18] M.X. Goemans and D.P. Williamson, "Improved Approximation Algorithms for Maximum Cut and Satisfiability Problems Using Semidefinite Programming,"
J. Assoc. for Computing Machinery, vol. 42, no. 6, pp. 1115-1145, Nov. 1995.- [19] I. Gronau, S. Moran, and S. Snir, "Fast and Reliable Reconstruction of Phylogenetic Trees with Very Short Branches,"
Proc. ACM-SIAM Symp. Discrete Algorithms (SODA), pp. 379-388, 2008.- [20] D. Huson, S. Nettles, and T. Warnow, "Disk-Covering, a Fast Converging Method for Phylogenetic Tree Reconstruction,"
J. Computational Biology, vol. 6, pp. 369-386, 1999.- [21] E. Mossel, "Distorted Metrics on Trees and Phylogenetic Forests,"
IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 4, no. 1, pp. 108-116, Jan.-Mar. 2007.- [22] M. Wojciechowski, M. Sanderson, K. Steele, and A. Liston, "Molecular Phylogeny of the 'Temperate Herbaceous Tribes' of Papilionoid Legumes: A Supertree Approach,"
Advances in Legume Systematics, Part 9, P. Herendeen and A. Bruneau, eds., vol. 9, pp. 277-298, Royal Botanic Gardens, 2000.- [23] S. Price, O. Bininda Emonds, and J. Gittleman, "A Complete Phylogeny of the Whales, Dolphins and Even-Toed Hoofed Mammals (Cetartiodactyla),"
Biological Rev., vol. 80, no. 3, pp. 445-473, 2005.- [24] M.A. Ragan, "Matrix Representation in Reconstructing Phylogenetic-Relationships among the Eukaryotes,"
Biosystems, vol. 28, pp. 47-55, 1992.- [25] B. Rannala, J.P. Huelsenbeck, Z. Yang, and R. Nielsen, "Taxon Sampling and the Accuracy of Large Phylogenies,"
Systematic Biology, vol. 47, pp. 702-710, 1998.- [26] V. Ranwez and O. Gascuel, "Quartet-Based Phylogenetic Inference: Improvements and Limits,"
Molecular Biology and Evolution, vol. 18, pp. 1103-1116, 2001.- [27] K. Rice, M. Donoghue, and R. Olmstead, "Analyzing Large Datasets: rbcL 500 Revisited,"
Systematic Biology, vol. 46, no. 3, pp. 554-563, 1997.- [28] U. Roshan, B.M.E. Moret, T.L. Williams, and T. Warnow, "Rec-i-dcm3: A Fast Algorithmic Technique for Reconstructing Large Phylogenetic Tree,"
Proc. IEEE Computational Systems Bioinformatics Conf. (CSB), 2004.- [29] S. Snir and S. Rao, "Using Max Cut to Enhance Rooted Trees Consistency,"
IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 3, no. 4, pp. 323-333, Oct.-Dec. 2006, preliminary version appeared in Proc. WABI, 2005.- [30] S. Snir, T. Warnow, and S. Rao, "Short Quartet Puzzling: A New Quartet-Based Phylogeny Reconstruction Algorithm,"
J. Computational Biology, vol. 1, no. 15, pp. 91-103, 2008.- [31] M. Steel, "The Complexity of Reconstructing Trees from Qualitative Characters and Subtress,"
J. Classification, vol. 9, no. 1, pp. 91-116, 1992.- [32] K. Strimmer, N. Goldman, and A. von Haeseler, "Bayesian Probabilities and Quartet Puzzling,"
Molecular Biology and Evolution, vol. 14, pp. 210-211, 1997.- [33] K. Strimmer and A. von Haeseler, "Quartet Puzzling: A Quartet Maximum-Likelihood Method for Reconstructing Tree Topologies,"
Molecular Biology and Evolution, vol. 13, no. 7, pp. 964-969, 1996, ftp://ftp.ebi.ac.uk/pub/software/unixpuzzle /. - [34] S. Willson, "Building Phylogenetic Trees from Quartets by Using Local Inconsistency Measures,"
Molecular Biology and Evolution, vol. 16, pp. 685-693, 1998. |