Subscribe
Issue No.06 - Nov.-Dec. (2013 vol.10)
pp: 1432-1441
Andre Wehe , Dept. of Biol., Univ. of Florida, Gainesville, FL, USA
J. Gordon Burleigh , Dept. of Comput. Sci., Iowa State Univ., Ames, IA, USA
Oliver Eulenstein , Dept. of Comput. Sci., Iowa State Univ., Ames, IA, USA
ABSTRACT
Phylogenetic inference is a computationally difficult problem, and constructing high-quality phylogenies that can build upon existing phylogenetic knowledge and synthesize insights from new data remains a major challenge. We introduce knowledge-enhanced phylogenetic problems for both supertree and supermatrix phylogenetic analyses. These problems seek an optimal phylogenetic tree that can only be assembled from a user-supplied set of, possibly incompatible, phylogenetic relationships. We describe exact polynomial time algorithms for the knowledge-enhanced versions of the NP-hard Robinson Foulds, gene duplication, duplication and loss, and deep coalescence supertree problems. Further, we demonstrate that our algorithms can rapidly improve upon results of local search heuristics for these problems. Finally, we introduce a knowledge-enhanced search heuristic that can be applied to any discrete character data set using the maximum parsimony (MP) phylogenetic problem. Although this approach is not guaranteed to find exact solutions, we show that it also can improve upon solutions from commonly used MP heuristics.
INDEX TERMS
Phylogeny, Algorithm design and analysis, Heuristic algorithms, Bioinformatics, Radio frequency, Computational biology, Search problems,supermatrix, Phylogenetics, supertree
CITATION
Andre Wehe, J. Gordon Burleigh, Oliver Eulenstein, "Efficient Algorithms for Knowledge-Enhanced Supertree and Supermatrix Phylogenetic Problems", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.10, no. 6, pp. 1432-1441, Nov.-Dec. 2013, doi:10.1109/TCBB.2012.162
REFERENCES
 [1] N. Goldman and Z. Yang, "Introduction. Statistical and Computational Challenges in Molecular Phylogenetics and Evolution," Philosophical Trans. of the Royal Soc. B: Biological Sciences, vol. 363, no. 1512, pp. 3889-3892, 2008. [2] O. Bininda-Emonds, Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life, vol. 4. Springer, 2004. [3] O. Bininda-Emonds, J. Gittleman, and M. Steel, "The (Super) Tree of Life: Procedures, Problems, and Prospects," Ann. Rev. of Ecology and Systematics, vol. 33, pp. 265-289, 2002. [4] A. de Queiroz and J. Gatesy, "The Supermatrix Approach to Systematics," Trends in Ecology & Evolution, vol. 22, no. 1, pp. 34-41, 2007. [5] J. Barthélemy and F. McMorris, "The Median Procedure for N-Trees," J. Classification, vol. 3, no. 2, pp. 329-334, 1986. [6] B. Ma, M. Li, and L. Zhang, "From Gene Trees to Species Trees," SIAM J. Computing, vol. 30, no. 3, pp. 729-752, 2001. [7] L. Zhang, "From Gene Trees to Species Trees II: Species Tree Inference in the Deep Coalescence Model," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 8, no. 6, pp. 1685-1691, 2011. [8] M. Bansal and R. Shamir, "A Note on the Fixed Parameter Tractability of the Gene-Duplication Problem," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 8, no. 3, pp. 848-850, May/June 2011. [9] M. Bansal, J. Burleigh, O. Eulenstein, and A. Wehe, "Heuristics for the Gene-Duplication Problem: A $\Theta (n)$ Speed-Up for the Local Search," Proc. 11th Ann. Int'l Conf. Research in Computational Molecular Biology, pp. 238-252, 2007. [10] M. Bansal and R. Shamir, "A Note on the Fixed Parameter Tractability of the Gene-Duplication Problem," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 8, no. 3, pp. 848-850, May/June 2011. [11] M. Bansal, J. Burleigh, and O. Eulenstein, "Efficient Genome-Scale Phylogenetic Analysis under the Duplication-Loss and Deep Coalescence Cost Models," BMC Bioinformatics, vol. 11, no. Suppl 1, article S42, 2010. [12] M. Bansal and O. Eulenstein, "An $\Omega (n^2/log n)$ Speed-up of TBR Heuristics for the Gene-Duplication Problem," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 5, no. 4, pp. 514-524, Oct.-Dec. 2008. [13] M. Bansal, O. Eulenstein, and A. Wehe, "The Gene-Duplication Problem: Near-Linear Time Algorithms for NNI-Based Local Searches," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 6, no. 2, pp. 221-231, Apr.-June 2009. [14] R. Chaudhary, M. Bansal, A. Wehe, D. Fernández-Baca, and O. Eulenstein, "iGTP: A Software Package for Large-Scale Gene Tree Parsimony Analysis," BMC Bioinformatics, vol. 11, no. 1, article 574, 2010. [15] P. Górecki, J. Burleigh, and O. Eulenstein, "GTP Supertrees from Unrooted Gene Trees: Linear Time Algorithms for NNI Based Local Searches," Proc. Eight Int'l Conf. Bioinformatics Research and Applications, pp. 102-114, 2012. [16] W. Maddison and L. Knowles, "Inferring Phylogeny Despite Incomplete Lineage Sorting," Systematic Biology, vol. 55, no. 1, pp. 21-30, 2006. [17] A. Wehe, M. Bansal, J. Burleigh, and O. Eulenstein, "DupTree: A Program for Large-Scale Phylogenetic Analyses Using Gene Tree Parsimony," Bioinformatics, vol. 24, no. 13, pp. 1540-1541, 2008. [18] G. Giribet, "TNT: Tree Analysis Using New Technology," Systematic Biology, vol. 54, no. 1, pp. 176-178, 2005. [19] D. Swofford, "PAUP∗. Phylogenetic Analysis Using Parsimony (∗ and Other Methods). Version 4," 2003. [20] A. Stamatakis, "RAxML-VI-HPC: Maximum Likelihood-Based Phylogenetic Analyses with Thousands of Taxa and Mixed Models," Bioinformatics, vol. 22, no. 21, pp. 2688-2690, 2006. [21] M. Bansal, J. Burleigh, O. Eulenstein, and D. Fernández-Baca, "Robinson-Foulds Supertrees," Algorithms for Molecular Biology, vol. 5, no. 1, article 18, 2010. [22] A. Stamatakis, P. Hoover, and J. Rougemont, "A Rapid Bootstrap Algorithm for the RAxML Web Servers," Systematic Biology, vol. 57, no. 5, pp. 758-771, 2008. [23] R. Chaudhary, J. Burleigh, and D. Fernández-Baca, "Fast Local Search for Unrooted Robinson-Foulds Supertrees," Proc. Int'l Conf. Bioinformatics Research and Applications, pp. 184-196, 2011. [24] M. Goodman, J. Czelusniak, G. Moore, A. Romero-Herrera, and G. Matsuda, "Fitting the Gene Lineage into Its Species Lineage, a Parsimony Strategy Illustrated by Cladograms Constructed from Globin Sequences," Systematic Biology, vol. 28, no. 2, pp. 132-163, 1979. [25] R. Guigo, I. Muchnik, and T. Smith, "Reconstruction of Ancient Molecular Phylogeny," Molecular Phylogenetics and Evolution, vol. 6, no. 2, pp. 189-213, 1996. [26] R. Page and M. Charleston, "From Gene to Organismal Phylogeny: Reconciled Trees and the Gene Tree/Species Tree Problem," Molecular Phylogenetics and Evolution, vol. 7, no. 2, pp. 231-240, 1997. [27] J. Slowinski and R. Page, "How Should Species Phylogenies be Inferred from Sequence Data?" Systematic Biology, vol. 48, no. 4, pp. 814-825, 1999. [28] A. Wehe and J.G. Burleigh, "Scaling the Gene Duplication Problem towards the Tree of Life," Proc. ISCA Second Int'l Conf. Bioinformatics and Computational Biology (BICoB), H. Al-Mubaid, ed., pp. 133-138, 2010. [29] B. Baum, "Combining Trees as a Way of Combining Data Sets for Phylogenetic Inference, and the Desirability of Combining Gene Trees," Taxon, vol. 41, pp. 3-10, 1992. [30] M. Ragan, "Phylogenetic Inference Based on Matrix Representation of Trees," Molecular Phylogenetics and Evolution, vol. 1, no. 1, pp. 53-58, 1992. [31] M. Sanderson, A. Purvis, and C. Henze, "Phylogenetic Supertrees: Assembling the Trees of Life," Trends in Ecology & Evolution, vol. 13, no. 3, pp. 105-109, 1998. [32] O. Bininda-Emonds, M. Cardillo, K. Jones, R. MacPhee, R. Beck, R. Grenyer, S. Price, R. Vos, J. Gittleman, and A. Purvis, "The Delayed Rise of Present-Day Mammals," Nature, vol. 446, no. 7135, pp. 507-512, 2007. [33] D. Robinson and L. Foulds, "Comparison of Phylogenetic Trees," Math. Biosciences, vol. 53, no. 1/2, pp. 131-147, 1981. [34] C. Than and L. Nakhleh, "Species Tree Inference by Minimizing Deep Coalescences," PLoS Computational Biology, vol. 5, no. 9, article e1000501, 2009. [35] Y. Yu, T. Warnow, and L. Nakhleh, "Algorithms for MDC-Based Multi-Locus Phylogeny Inference: Beyond Rooted Binary Gene Trees on Single Alleles," J. Computational Biology, vol. 18, pp. 1543-1559, 2011. [36] M. Hallett and J. Lagergren, "New Algorithms for the Duplication-Loss Model," Proc. Fourth Ann. Int'l Conf. Computational Molecular Biology, pp. 138-146, 2000. [37] D. Sankoff, "Minimal Mutation Trees of Sequences," SIAM J. Applied Math., vol. 28, pp. 35-42, 1975. [38] D. Sankoff and P. Rousseau, "Locating the Vertices of a Steiner Tree in an Arbitrary Metric Space," Math. Programming, vol. 9, no. 1, pp. 240-246, 1975. [39] M. Cardillo, R. Bininda-Emonds, E. Boakes, and A. Purvis, "A Species-Level Phylogenetic Supertree of Marsupials," J. Zoology, vol. 264, no. 1, pp. 11-31, 2004. [40] J. Burleigh, W. Barbazuk, J. Davis, A. Morse, and P. Soltis, "Exploring Diversification and Genome Size Evolution in Extant Gymnosperms through Phylogenetic Synthesis," J. Botany, vol. 2012, article 292854, 2012. [41] K. Nixon, "The Parsimony Ratchet, a New Method for Rapid Parsimony Analysis," Cladistics, vol. 15, no. 4, pp. 407-414, 1999. [42] D.E. Soltis, M.E. Mort, M. Latvis, E.V. Mavrodiev, B.C. O'Meara, P.S. Soltis, J.G. Burleigh, and R.R. de Casas, "Phylogenetic Relationships and Character Evolution Analysis of Saxifragales Using a Supermatrix Approach," Am. J. Botany, vol. 100, no. 5, pp. 916-929, 2013.