Subscribe
Issue No.02 - April-June (2008 vol.5)
pp: 235-244
ABSTRACT
In conservation biology it is a central problem to measure, predict, and preserve biodiversity as species face extinction. In 1992 Faith proposed measuring the diversity of a collection of species in terms of their relationships on a phylogenetic tree, and to use this information to identify collections of species with high diversity. Here we are interested in some variants of the resulting optimization problem that arise when considering species whose evolution is better represented by a network rather than a tree. More specifically, we consider the problem of computing phylogenetic diversity relative to a split system on a collection of species of size $n$. We show that for general split systems this problem is NP-hard. In addition we provide some efficient algorithms for some special classes of split systems, in particular presenting an optimal $O(n)$ time algorithm for phylogenetic trees and an $O(n\log n + n k)$ time algorithm for choosing an optimal subset of size $k$ relative to a circular split system.
INDEX TERMS
Biology and genetics, Life and Medical Sciences
CITATION
Andreas Spillner, Binh T. Nguyen, Vincent Moulton, "Computing Phylogenetic Diversity for Split Systems", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.5, no. 2, pp. 235-244, April-June 2008, doi:10.1109/TCBB.2007.70260
REFERENCES
 [1] D.P. Faith, “Conservation Evaluation and Phylogenetic Diversity,” Biological Conservation, vol. 61, pp. 1-10, 1992. [2] G.M. Barker, “Phylogenetic Diversity: A Quantitative Framework for Measurement of Priority and Achievement in Biodiversity Conservation,” Biological J. Linnean Soc., vol. 76, pp. 165-194, 2002. [3] D.P. Faith and A.M. Baker, “Phylogenetic Diversity (PD) and Biodiversity Conservation: Some Bioinformatics Challenges,” Evolutionary Bioinformatics Online, vol. 2, pp. 70-77, 2006. [4] K. Hartmann and M. Steel, “Phylogenetic Diversity: From Combinatorics to Ecology,” New Math. Models in Evolution, O.Gascuel and M. Steel, eds., Oxford Univ. Press, 2006. [5] F. Pardi and N. Goldman, “Species Choice for Comparative Genomics: Being Greedy Works,” PLoS Genetics, vol. 1, no. 6, 2005. [6] D. Faith and K. Williams, “Phylogenetic Diversity and Biodiversity Planning,” McGraw-Hill Yearbook of Science and Technology, pp.233-235, McGraw-Hill Professional, 2006. [7] M. Steel, “Phylogenetic Diversity and the Greedy Algorithm,” Systematic Biology, vol. 54, pp. 527-529, 2005. [8] B.Q. Minh, S. Klaere, and A. von Haeseler, “Phylogenetic Diversity within Seconds,” Systematic Biology, vol. 55, pp. 769-773, 2006. [9] M.L. Weitzmann, “The Noah's Ark Problem,” Econometrica, vol. 66, pp. 1279-1298, 1998. [10] K. Hartmann and M. Steel, “Maximizing Phylogenetic Diversity in Biodiversity Conservation: Greedy Solutions to the Noah's Ark Problem,” Systematic Biology, vol. 55, pp. 644-651, 2006. [11] F. Pardi and N. Goldman, “Resource-Aware Taxon Selection for Maximizing Phylogenetic Diversity,” Systematic Biology, vol. 56, pp. 431-444, 2007. [12] V. Moulton, C. Semple, and M. Steel, “Optimizing Phylogenetic Diversity under Constraints,” J. Theoretical Biology, vol. 246, pp.186-194, 2007. [13] D. Bryant and D. Huson, “Application of Phylogenetic Networks in Evolutionary Studies,” Molecular Biology and Evolution, vol. 23, pp. 254-267, 2006. [14] M. Kotetishvili, O. Stine, A. Kreger, J. Morris, and A. Sulakvelidze, “Multilocus Sequence Typing for Characterization of Clinical and Environmental Salmonella Strains,” J. Clinical Microbiology, vol. 51, pp. 1626-1635, 2002. [15] D. Bryant and V. Moulton, “Neighbornet: An Agglomerative Method for the Construction of Phylogenetic Networks,” Molecular Biology and Evolution, vol. 21, pp. 255-265, 2004. [16] B.Q. Minh, S. Klaere, and A. von Haeseler, “Phylogenetic Diversity Algorithm,” http: //www.cibiv.at/softwarepda/, Aug. 2007. [17] B.Q. Minh, S. Klaere, and A. von Haeseler, “Phylogenetic Diversity on Split Networks,” unpublished manuscript, 2007. [18] B. Holland, K.T. Huber, V. Moulton, and P. Lockhart, “Using Consensus Networks to Visualize Contradictory Evidence for Species Phylogeny,” Molecular Biology and Evolution, vol. 21, pp.1459-1461, 2004. [19] P. Buneman, “The Recovery of Trees from Measures of Dissimilarity,” Math. in the Archeological and Historical Sciences, F.R.Hodson, D.G. Kendall, and P. Tautu, eds., pp. 387-395, Edinburgh Univ. Press, 1971. [20] M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman, 1979. [21] D. Hochbaum, “Approximating Covering and Packing Problems: Set Cover, Vertex Cover, Independent Set, and Related Problems,” Approximation Algorithms for NP-Hard Problems, D. Hochbaum, ed., PWS Publishing, 1997. [22] M. Bordewich and C. Semple, “Nature Reserve Selection Problem: A Tight Approximation Algorithm,“ Technical Report UCDMS2007/1, NZ, Univ. of Canterbury, 2007. [23] P. Alimonti and V. Kann, “Hardness of Approximating Problems on Cubic Graphs,” Proc. Italian Conf. Algorithms and Complexity, pp. 288-298, 1997. [24] C. Semple and M. Steel, Phylogenetics. Oxford Univ. Press, 2003. [25] A. Shioura and T. Uno, “A Linear Time Algorithm for Finding a $k\hbox{-}{\rm Tree}$ Core,” J. Algorithms, vol. 23, pp. 281-290, 1997. [26] M. Blum, R.W. Floyd, V.R. Pratt, R.L. Rivest, and R.E. Tarjan, “Time Bounds for Selection,” J. Computer and System Sciences, vol. 7, pp. 448-461, 1973. [27] R. Uehara and Y. Uno, “Efficient Algorithms for the Longest Path Problem,” Proc. Int'l Symp. Algorithms and Computation, pp. 871-883, 2004. [28] D. Bryant and A. Dress, “Linearly Independent Split Systems,” European J. Combinatorics, vol. 28, pp. 1814-1831, 2007. [29] D. Bryant, V. Moulton, and A. Spillner, “Computing Planar Split Graphs,” The Ann. New Zealand Phylogenetics Meeting, 2007. [30] D. Eppstein, M.H. Overmars, G. Rote, and G.J. Woeginger, “Finding Minimum Area $k\hbox{-}{\rm Gons}$ ,” Discrete and Computational Geometry, vol. 7, pp. 45-58, 1992. [31] H. Edelsbrunner, Algorithms in Combinatorial Geometry. Springer, 1987. [32] J.E. Boyce, D.P. Dobkin, R.L. Drysdale, and L.J. Guibas, “Finding Extremal Polygons,” SIAM J. Computing, vol. 14, pp. 134-147, 1985. [33] H.J. Bandelt and A. Dress, “A Canonical Split Decomposition Theory for Metrics on a Finite Set,” Advances in Math., vol. 92, pp.47-105, 1992. [34] K. Kalmanson, “Edgeconvex Circuits and the Travelling Salesman Problem,” Canadian J. Math., vol. 27, pp. 1000-1010, 1975. [35] V. Chepoi and B. Fichet, “A Note on Circular Decomposable Metrics,” Geometriae Dedicata, vol. 69, pp. 237-240, 1998. [36] A. Aggarwal, M.M. Klawe, S. Moran, P. Shor, and R. Wilber, “Geometric Applications of a Matrix-Searching Algorithm,” Algorithmica, vol. 2, pp. 195-208, 1987. [37] H. Schwöbbermeyer and J.T. Kim, “A Comparative Analysis of Biodiversity Measures,” Proc. European Conf. Advances in Artificial Life, pp. 119-128, 1999. [38] A.S.L. Rodrigues and K.J. Gaston, “Maximising Phylogenetic Diversity in the Selection of Networks of Conservation Areas,” Biological Conservation, vol. 105, pp. 103-111, 2002. [39] L. Lewis and P. Lewis, “Unearthing the Molecular Phylodiversity of Desert Soil Green Algae (Chlorophyta),” Systematic Biology, vol. 54, pp. 936-947, 2005. [40] A. Blum, P. Chalasani, D. Coppersmith, W.R. Pulleyblank, P. Raghavan, and M. Sudan, “The Minimum Latency Problem,” Proc. ACM Symp. Theory of Computing, pp. 163-171, 1994. [41] B.R. Holland, “Evolutionary Analysis of Large Data Sets: Trees and Beyond,” PhD dissertation, Massey Univ., 2001. [42] S.S. Ravi, D.J. Rosenkrantz, and G.K. Tayi, “Heuristic and Special Case Algorithms for Dispersion Problems,” Operations Research, vol. 42, pp. 299-310, 1994. [43] B. Chandra and M. Halldórsson, “Approximation Algorithms for Dispersion Problems,” J. Algorithms, vol. 38, pp. 438-465, 2001. [44] R. Chandrasekaran and A. Daughety, “Location on Tree Networks: $p\hbox{-}{\rm Centre}$ and $n\hbox{-}{\rm Dispersion}$ Problems,” Math. Operations Research, vol. 6, pp. 50-57, 1981. [45] G.A. Waterson, “On the Number of Segregating Sites in Genetical Models without Recombination,” Theoretical Population Biology, vol. 7, pp. 256-276, 1975. [46] G.M. Ziegler, Lectures on Polytopes. Springer, 1995.