This Article 
 Bibliographic References 
 Add to: 
Metrics on Multilabeled Trees: Interrelationships and Diameter Bounds
July/August 2011 (vol. 8 no. 4)
pp. 1029-1040
Katharina T. Huber, University of East Anglia , Norwich
Andreas Spillner, University of Greifswald, Greifswald
Radosław Suchecki, University of East Anglia, Norwich
Vincent Moulton, University of East Anglia
Multilabeled trees or MUL-trees, for short, are trees whose leaves are labeled by elements of some nonempty finite set X such that more than one leaf may be labeled by the same element of X. This class of trees includes phylogenetic trees and tree shapes. MUL-trees arise naturally in, for example, biogeography and gene evolution studies and also in the area of phylogenetic network reconstruction. In this paper, we introduce novel metrics which may be used to compare MUL-trees, most of which generalize well-known metrics on phylogenetic trees and tree shapes. These metrics can be used, for example, to better understand the space of MUL-trees or to help visualize collections of MUL-trees. In addition, we describe some relationships between the MUL-tree metrics that we present and also give some novel diameter bounds for these metrics. We conclude by briefly discussing some open problems as well as pointing out how MUL-tree metrics may be used to define metrics on the space of phylogenetic networks.

[1] K.T. Huber and V. Moulton, “Phylogenetic Networks from Multi-Labeled Trees,” J. Math. Biology, vol. 52, pp. 613-632, 2006.
[2] C. Semple and M. Steel, Phylogenetics. Oxford Univ. Press, 2003.
[3] G. Ganapathy, B. Goodson, R. Jansen, H. Le, V. Ramachandran, and T. Warnow, “Pattern Identification in Biogeography,” IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 3, no. 4, pp. 334-346, Oct.-Dec. 2006.
[4] R. Page, “Quantitative Cladistic Biogeography: Constructing and Comparing Area Cladograms,” Systematic Zoology, vol. 37, pp. 254-270, 1988.
[5] L. Arvestad, A.-C. Berglund, J. Lagergren, and B. Sennblad, “Bayesian Gene/Species Tree Reconciliation and Orthology Analysis Using MCMC,” Bioinformatics, vol. 19, pp. 7-15, 2003.
[6] M. Lott, A. Spillner, K.T. Huber, A. Petri, B. Oxelman, and V. Moulton, “Inferring Polyploid Phylogenies from Multiply-Labeled Gene Trees,” BMC Evolutionary Biology, vol. 9, 2009.
[7] A. Brysting, B. Oxelman, K.T. Huber, V. Moulton, and C. Brochmann, “Untangling Complex Histories of Genome Mergings in High Polyploids,” Systematic Biology, vol. 56, pp. 467-476, 2007.
[8] M. Popp, P. Erixon, F. Eggens, and B. Oxelman, “Origin and Evolution of a Circumpolar Polyploid Species Complex in Silene (Caryophyllaceae) Inferred from Low Copy Nuclear RNA Polymerase Introns, rDNA, and Chloroplast DNA,” Systematic Botany, vol. 30, pp. 302-313, 2005.
[9] T. Asai, H. Arimura, T. Uno, and S. Nakano, “Discovering Frequent Substructures in Large Unordered Trees,” Discovery Science, pp. 47-61, Springer, 2003.
[10] S. Chou and C.-L. Hsu, “MMDT: A Multi-Valued and Multi-Labeled Decision Tree Classifier for Data Mining,” Expert Systems with Applications, vol. 28, pp. 799-812, 2005.
[11] M. Crochemore and R. Verin, “Direct Construction of Compact Directed Acyclic Word Graphs,” Proc. Ann. Symp. Combinatorial Pattern Matching, pp. 116-129, 1997.
[12] B. Allen and M. Steel, “Subtree Transfer Operations and Their Induced Metrics on Evolutionary Trees,” Annals of Combinatorics, vol. 5, pp. 1-15, 2001.
[13] A. Dobson, “Comparing the Shapes of Trees,” Combinatorial Mathematics III, vol. 452, pp. 95-100, Springer, 1975.
[14] D. Robinson and L. Foulds, “Comparison of Phylogenetic Trees,” Math. Biosciences, vol. 53, pp. 131-147, 1981.
[15] M. Křivánek, “Computing the Nearest Neighbor Interchange Metric for Unlabeled Binary Trees Is NP-Complete,” J. Classification, vol. 3, pp. 55-60, 1986.
[16] F. Matsen, “A Geometric Approach to Tree Shape Statistics,” Systematic Biology, vol. 55, pp. 652-661, 2006.
[17] S. Guillemot, J. Jansson, and W.-K. Sung, “Computing a Smallest Multi-Labeled Phylogenetic Tree from Rooted Triplets,” Proc. Int'l Symp. Algorithms and Computation, pp. 1205-1214, 2009.
[18] G. Cardona, M. Llabrés, F. Rosselló, and G. Valiente, “On Nakhleh's Metric for Reduced Phylogenetic Networks,” IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 6, no. 4, pp. 629-638, Oct.-Dec. 2009.
[19] S.-J. Sul, S. Matthews, and T. Williams, “Using Tree Diversity to Compare Phylogenetic Heuristics,” BMC Bioinformatics, vol. 10, 2009.
[20] S. Whelan, “New Approaches to Phylogenetic Tree Search and Their Application to Large Numbers of Protein Alignments,” Systematic Biology, vol. 56, pp. 727-740, 2007.
[21] M. Hendy, M. Steel, D. Penny, and I. Henderson, “Families of Trees and Consensus,” Classification and Related Methods of Data Analysis, H. Bock, ed., pp. 355-362, Elsevier Science Publishers B.V., 1988.
[22] D. Hillis, T. Heath, and K. St. John, “Analysis and Visualization of Tree Space,” Systematic Biology, vol. 54, pp. 471-482, 2005.
[23] D. Rosen, “Vicariant Patterns and Historical Explanation in Biogeography,” Systematic Zoology, vol. 27, pp. 159-188, 1978.
[24] M. Hetland, “The Basic Principles of Metric Indexing,” Swarm Intelligence for Multi-Objective Problems in Data Mining, pp. 199-232, Springer, 2009.
[25] D. Bryant and M. Steel, “Computing the Distribution of a Tree Metric,” IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 6, no. 3, pp. 420-426, July-Sept. 2009.
[26] M. Steel and D. Penny, “Distributions of Tree Comparison Metrics—Some New Results,” Systematic Biology, vol. 42, pp. 126-141, 1993.
[27] J. Felsenstein, J. Archie, W. Day, W. Maddison, C. Meacham, F. Rohlf, and D. Swofford, “The Newick Tree Format,” phylipnewicktree.html, 1986.
[28] L. Nakhleh, “A Metric on the Space of Reduced Phylogenetic Networks,” IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 7, no. 2, pp. 218-222, Apr.-June 2010.
[29] A. Müller-Molina, K. Hirata, and T. Shinohara, “A Tree Distance Function Based on Multi-Sets,” New Frontiers in Applied Data Mining, pp. 87-98, Springer, 2009.
[30] W. Goddard, E. Kubicka, G. Kubicki, and F. McMorris, “The Agreement Metric for Labeled Binary Trees,” Math. Biosciences, vol. 123, pp. 215-226, 1994.
[31] D. Critchlow, D. Pearl, and C. Qian, “The Triples Distance for Rooted Bifurcating Phylogenetic Trees,” Systematic Biology, vol. 45, pp. 323-334, 1996.
[32] G. Cardona, M. Llabrés, F. Rosselló, and G. Valiente, “Metrics for Phylogenetic Networks II: Nodal and Triplets Metrics,” IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 6, no. 3, pp. 454-469, July-Sept. 2009.
[33] M. Bordewich and C. Semple, “On the Computational Complexity of the Rooted Subtree Prune and Regraft Distance,” Annals of Combinatorics, vol. 8, pp. 409-423, 2004.
[34] D. Robinson, “Comparison of Labeled Trees with Valency Three,” J. Combinatorial Theory, Series B, vol. 11, pp. 105-119, 1971.
[35] J. Hein, “Reconstructing Evolution of Sequences Subject to Recombination Using Parsimony,” Math. Biosciences, vol. 98, pp. 185-200, 1990.
[36] D. Bryant, “The Splits in the Neighborhood of a Tree,” Annals of Combinatorics, vol. 8, pp. 1-11, 2004.
[37] L. Nakhleh, D. Ruths, and L. Wang, “RIATA-HGT: A Fast and Accurate Heuristic for Reconstructing Horizontal Gene Transfer,” Proc. Ann. Int'l Conf. Computing and Combinatorics, pp. 84-93, 2005.
[38] M. Li, J. Tromp, and L. Zhang, “On the Nearest Neighbour Interchange Distance between Evolutionary Trees,” J. Theoretical Biology, vol. 182, pp. 463-467, 1996.
[39] D. Sleator, R. Tarjan, and W. Thurston, “Rotation Distance, Triangulations, and Hyperbolic Geometry,” J. Am. Math. Soc., vol. 1, pp. 647-681, 1988.
[40] A. Kupczok, A. von Haeseler, and S. Klaere, “An Exact Algorithm for the Geodesic Distance between Phylogenetic Trees,” J. Computational Biology, vol. 15, pp. 577-591, 2008.
[41] M. Steel and L. Székely, “An Improved Bound on the Maximum Agreement Subtree Problem,” Applied Math. Letters, vol. 22, pp. 1778-1780, 2009.
[42] B. DasGupta, X. He, T. Jiang, M. Li, J. Tromp, and L. Zhang, “On Distances between Phylogenetic Trees,” Proc. Ann. ACM-SIAM Symp. Discrete Algorithms, pp. 427-436, 1997.
[43] G. Cardona, M. Llabrés, F. Rosselló, and G. Valiente, “Metrics for Phylogenetic Networks I: Generalizations of the Robinson-Foulds Metric,” IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 6, no. 1, pp. 46-61, Jan.-Mar. 2009.
[44] G. Cardona, M. Llabrés, F. Rosselló, and G. Valiente, “The Comparison of Tree-Sibling Time Consistent Phylogenetic Networks Is Graph Isomorphism-Complete,” arXiv:0902.4640v1 [q-bio.PE], 2009.

Index Terms:
Multilabeled tree, MUL-tree, tree space, metric, domination, diameter bound.
Katharina T. Huber, Andreas Spillner, Radosław Suchecki, Vincent Moulton, "Metrics on Multilabeled Trees: Interrelationships and Diameter Bounds," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 8, no. 4, pp. 1029-1040, July-Aug. 2011, doi:10.1109/TCBB.2010.122
Usage of this product signifies your acceptance of the Terms of Use.