This Article 
 Bibliographic References 
 Add to: 
A Metric for Phylogenetic Trees Based on Matching
July-Aug. 2012 (vol. 9 no. 4)
pp. 1014-1022
Yu Lin, Lab. for Comput. Biol. & Bioinf., Swiss Fed. Inst. of Technol. (EPFL), Lausanne, Switzerland
V. Rajan, Lab. for Comput. Biol. & Bioinf., Swiss Fed. Inst. of Technol. (EPFL), Lausanne, Switzerland
B. M. E. Moret, Lab. for Comput. Biol. & Bioinf., Swiss Fed. Inst. of Technol. (EPFL), Lausanne, Switzerland
Comparing two or more phylogenetic trees is a fundamental task in computational biology. The simplest outcome of such a comparison is a pairwise measure of similarity, dissimilarity, or distance. A large number of such measures have been proposed, but so far all suffer from problems varying from computational cost to lack of robustness; many can be shown to behave unexpectedly under certain plausible inputs. For instance, the widely used Robinson-Foulds distance is poorly distributed and thus affords little discrimination, while also lacking robustness in the face of very small changes-reattaching a single leaf elsewhere in a tree of any size can instantly maximize the distance. In this paper, we introduce a new pairwise distance measure, based on matching, for phylogenetic trees. We prove that our measure induces a metric on the space of trees, show how to compute it in low polynomial time, verify through statistical testing that it is robust, and finally note that it does not exhibit unexpected behavior under the same inputs that cause problems with other measures. We also illustrate its usefulness in clustering trees, demonstrating significant improvements in the quality of hierarchical clustering as compared to the same collections of trees clustered using the Robinson-Foulds distance.

[1] B.L. Allen and M. Steel, "Subtree Transfer Operations and Their Induced Metrics on Evolutionary Trees," Annals of Combinatorics, vol. 5, no. 1, pp. 1-15, 2001.
[2] A. Amir and D. Keselman, "Maximum Agreement Subtree in a Set of Evolutionary Trees: Metrics and Efficient Algorithms," SIAM J. Computing, vol. 26, no. 6, pp. 1656-1669, 1997.
[3] D. Bogdanowicz and K. Giaro, "Matching Split Distance for Unrooted Binary Phylogenetic Trees," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 9, no. 1, pp. 150-160, Jan./Feb. 2012.
[4] D. Bryant, "Hunting for Trees, Building Trees and Comparing Trees: Theory and Method in Phylogenetic Analysis," PhD thesis, Univ. of Canterbury, 1997.
[5] D. Bryant and M. Steel, "Computing the Distribution of a Tree Metric," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 6, no. 3, pp. 420-426, July-Sept. 2009.
[6] R. Cole, M. Farach-Colton, R. Hariharan, T. Przytycka, and M. Thorup, "An O$(n log n)$ Algorithm for the Maximum Agreement Subtree Problem for Binary Trees," SIAM J. Computing, vol. 30, no. 5, pp. 1385-1404, 2000.
[7] B. DasGupta, X. He, T. Jiang, M. Li, J. Tromp, and L. Zhang, "On Distances between Phylogenetic Trees," Proc. Eighth ACM/SIAM Symp. Discrete Algorithms (SODA '97), pp. 427-436, 1997.
[8] W.H.E. Day, "Optimal Algorithms for Comparing Trees with Labeled Leaves," J. Classification, vol. 2, no. 1, pp. 7-28, 1985.
[9] J. Edmonds and R.M. Karp, "Theoretical Improvements in Algorithmic Efficiency for Network Flow Problems," J. ACM, vol. 19, no. 2, pp. 248-264, 1972.
[10] M. Farach, T.M. Przytycka, and M. Thorup, "On the Agreement of Many Trees," Information Processing Letters, vol. 55, no. 6, pp. 297-301, 1995.
[11] H.N. Gabow and R.E. Tarjan, "Faster Scaling Algorithms for Network Problems," SIAM J. Computing, vol. 18, no. 5, pp. 1013-1036, 1989.
[12] G. Hickey, F. Dehne, A. Rau-Chaplin, and C. Blouin, "SPR Distance Computation of Unrooted Trees," Evolutionary Bioinformatics Online, vol. 4, pp. 17-27, 2008.
[13] M.Y. Kao, "Tree Contractions and Evolutionary Trees," SIAM J. Computing, vol. 27, no. 6, pp. 1592-1616, 1998.
[14] M. Li, J. Tromp, and L. Zhang, "On the Nearest-Neighbour Interchange Distance between Evolutionary Trees," J. Theoretical Biology, vol. 182, no. 4, pp. 463-467, 1996.
[15] M. Meila, "Comparing Clusterings-an Information Based Distance," J. Multivariate Analysis, vol. 98, no. 5, pp. 873-895, 2007.
[16] N.D. Pattengale, E.J. Gottlieb, and B.M.E. Moret, "Efficiently Computing the Robinson-Foulds Metric," J. Computational Biology, vol. 14, no. 6, pp. 724-735, 2007.
[17] N.D. Pattengale, K.M. Swenson, and B.M.E. Moret, "Uncovering Hidden Phylogenetic Consensus," Proc. Sixth Int'l Symp. Bioinformatics Research and Applications, (ISBRA '10), pp. 128-139, 2010.
[18] D.R. Robinson and L.R. Foulds, "Comparison of Phylogenetic Trees," Math. Biosciences, vol. 53, pp. 131-147, 1981.
[19] M. Steel and D. Penny, "Distributions of Tree Comparison Metrics—Some New Results," Systematic Biology, vol. 42, no. 2, pp. 126-141, 1993.
[20] M. Steel and T. Warnow, "Kaikoura Tree Theorems: Computing Maximum Agreement Subtree Problem," Information Processing Letters, vol. 48, pp. 77-82, 1993.
[21] C. Stockham, L.-S. Wang, and T. Warnow, "Statistically-Based Postprocessing of Phylogenetic Analysis Using Clustering," Proc. 10th Conf. Intelligent Systems for Molecular Biology (ISMB '02), pp. S285-S293, 2002.
[22] C. Whidden and N. Zeh, "A Unifying View on Approximation and Fpt of Agreement Forests," Proc. Sixth Workshop Algorithms in Bioinformatics (WABI '06), pp. 390-402, 2006.

Index Terms:
genetics,bioinformatics,botany,evolution (biological),Robinson-Foulds distance,phylogenetic trees,computational biology,pairwise measurement,Robinson-Foulds distance,pairwise distance measurement,statistical testing,hierarchical clustering,Phylogeny,Computational biology,Robustness,Bioinformatics,Time measurement,Polynomials,TBR.,Phylogenetic trees,matching distance,Robinson-Foulds distance,NNI,SPR
Yu Lin, V. Rajan, B. M. E. Moret, "A Metric for Phylogenetic Trees Based on Matching," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 4, pp. 1014-1022, July-Aug. 2012, doi:10.1109/TCBB.2011.157
Usage of this product signifies your acceptance of the Terms of Use.