The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.05 - September/October (2011 vol.8)
pp: 1196-1207
Simone Battagliero , IBM Italia S.p.A., GBS BAO Advanced Analytics Services and MBLab, Bari
Giuseppe Puglia , IBM Italia S.p.A., GBS BAO Advanced Analytics Services and MBLab, Bari
Saverio Vicario , Consiglio Nazionale delle Ricerche-Istituto di Tecnologie Biomediche-Sede di Bari, Bari
Francesco Rubino , IBM Italia S.p.A., GBS BAO Advanced Analytics Services and MBLab, Bari
Gaetano Scioscia , IBM Italia S.p.A., GBS BAO Advanced Analytics Services and MBLab, Bari
Pietro Leo , IBM Italia S.p.A., GBS BAO Advanced Analytics Services and MBLab, Bari
ABSTRACT
The increasing use of phylogeny in biological studies is limited by the need to make available more efficient tools for computing distances between trees. The geodesic tree distance—introduced by Billera, Holmes, and Vogtmann—combines both the tree topology and edge lengths into a single metric. Despite the conceptual simplicity of the geodesic tree distance, algorithms to compute it don't scale well to large, real-world phylogenetic trees composed of hundred or even thousand leaves. In this paper, we propose the geodesic distance as an effective tool for exploring the likelihood profile in the space of phylogenetic trees, and we give a cubic time algorithm, GeoHeuristic, in order to compute an approximation of the distance. We compare it with the GTP algorithm, which calculates the exact distance, and the cone path length, which is another approximation, showing that GeoHeuristic achieves a quite good trade-off between accuracy (relative error always lower than 0.0001) and efficiency. We also prove the equivalence among GeoHeuristic, cone path, and Robinson-Foulds distances when assuming branch lengths equal to unity and we show empirically that, under this restriction, these distances are almost always equal to the actual geodesic.
INDEX TERMS
Analysis of algorithms, phylogeny, tree distance, geodesic, discrete mathematics.
CITATION
Simone Battagliero, Giuseppe Puglia, Saverio Vicario, Francesco Rubino, Gaetano Scioscia, Pietro Leo, "An Efficient Algorithm for Approximating Geodesic Distances in Tree Space", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.8, no. 5, pp. 1196-1207, September/October 2011, doi:10.1109/TCBB.2010.121
REFERENCES
[1] N. Amenta, M. Godwin, N. Postarnakevich, and K.S. John, “Approximating Geodesic Tree Distance,” Information Processing Letters, vol. 103, pp. 61-65, 2007.
[2] C. Berge, Graphs and Hypergraphs. Elsevier Science Ltd., 1985.
[3] L. Billera, S. Holmes, and K. Vogtmann, “Geometry of the Space of Phylogenetic Trees,” Advances in Applied Math., vol. 27, pp. 733-767, 2001.
[4] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge Univ. Press, 2004.
[5] M.P. Cummings, S.A. Handley, D.S. Myers, D.L. Reed, A. Rokas, and K. Winka, “Comparing Bootstrap and Posterior Probability Values in the Four-Taxon Case,” Systematic Biology, vol. 52, pp. 477-487, 2003.
[6] A.J. Drummond and A. Rambaut, “BEAST: Bayesian Evolutionary Analysis by Sampling Trees,” BMC Evolutionary Biology, vol. 7, article no. 214, 2007.
[7] A.J. Drummond and K. Strimmer, “PAL: An Object-Oriented Programming Library for Molecular Evolution and Phylogenetics,” Bioinformatics, vol. 17, pp. 662-663, http://www.cebl. auckland.ac.nzpal-project /, 2001.
[8] S.V. Edwards, “Is a New and General Theory of Molecular Systematics Emerging?,” Evolution, vol. 63, pp. 1-19, 2009.
[9] J. Felsenstein, Inferring Phylogenies. Sinauer Associates, 2004.
[10] W. Gilks, S. Richardson, and D. Spiegelhalter, Markov Chain Monte Carlo in Practice. Chapman & Hall, 1996.
[11] N. Goldman, J.P. Anderson, and A.G. Rodrigo, “Likelihood-Based Tests of Topologies in Phylogenetics,” Systematic Biology, vol. 49, pp. 652-670, 2000.
[12] E.F. Harding, “The Probabilities of Rooted Tree-Shapes Generated by Random Bifurcation,” Advances in Applied Probability, vol. 3, pp. 44-77, 1971.
[13] M. Holder and P.O. Lewis, “Phylogeny Estimation: Traditional and Bayesian Approaches,” Nature Rev. Genetics, vol. 43, pp. 275-284, 2003.
[14] P. Huggins, M. Owen, and R. Yoshida, “First Steps toward the Geometry of Cophylogeny,” arXiv:0809.1908v3.
[15] D.H. Janzen, M. Hajibabaei, J.M. Burns, W. Hallwachs, E. Remigio, and P.D.N. Hebert, “Wedding Biodiversity Inventory of a Large and Complex Lepidoptera Fauna with DNA Barcoding,” Philosophical Trans. Royal Soc. B, vol. 360, pp. 1835-1845, 2005.
[16] A. Kupczok, A. von Haeseler, and S. Klaere, “An Exact Algorithm for the Geodesic Distance between Phylogenetic Trees,” J. Computational Biology, vol. 15, pp. 577-591, 2008.
[17] Mesquite: A Modular System for Evolutionary Analysis, http://mesquiteproject.org/mesquitemesquite.html , 2011.
[18] MrBayes: Bayesian Inference of Phylogeny, http:/mrbayes.csit. fsu.edu, 2011.
[19] K. Munch, W. Boomsma, E. Willerslev, and R. Nielsen, “Fast Phylogenetic DNA Barcoding,” Philosophical Trans. Royal Soc. B, vol. 363, pp. 3997-4002, 2008.
[20] C. Lakner, P. Van der Mark, J.P. Huelsenbeck, B. Larget, and F. Ronquist, “Efficiency of Markov Chain Monte Carlo Tree Proposals in Bayesian Phylogenetics,” Systematic Biology, vol. 57, pp. 86-103, 2008.
[21] MUSCLE: Protein Multiple Sequence Alignment Software, http://www.drive5.commuscle, 2011.
[22] M. Owen, “Distance Computation in the Space of Phylogenetic Trees,” http://www.cam.cornell.edu/~maowen/pubthesis.pdf , 2011.
[23] M. Owen and J.S. Provan, “A Fast Algorithm for Computing Geodesic Distances in Tree Space,” IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 8, no. 1, pp. 2-13, Jan./Feb. 2011.
[24] Python for S60, Project Home Page: https://garage.maemo.org/projectspys60/, 2011.
[25] C.P. Robert and G. Casella, Monte Carlo Statistical Methods. Springer, 2004.
[26] D.F. Robinson and L.R. Foulds, “Comparison of Phylogenetic Trees,” Math. Biosciences, vol. 53, pp. 131-147, 1981.
[27] C.L. Schardl, K.D. Craven, S. Speakman, A. Stromberg, A. Lindstrom, and R. Yoshida, “A Novel Test for Host-Symbiont Codivergence Indicates Ancient Origin of Fungal Endophytes in Grasses,” Systematic Biology, vol. 57, no. 3, pp. 483-498, 2008.
[28] K. Vogtmann, “Geodesics in the Space of Trees,” www.math. cornell.edu/~vogtmann/papers/TreeGeodesicss index.html, 2007.
[29] Z. Wang, F. López-Giráldez, and J.P. Townsend, “Snapshots of Tree Space,” Evolutionary Bioinformatics, vol. 5, pp. 133-136, 2009.
[30] D.J. Zwickl, “GARLI, Genetic Algorithm for Rapid Likelihood Inference,” version 0.94, http://code.google.com/pgarli/, 2006.
19 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool