This Article 
 Bibliographic References 
 Add to: 
On the Elusiveness of Clusters
March/April 2012 (vol. 9 no. 2)
pp. 517-534
Steven M. Kelk, Maastricht University, Maastricht
Celine Scornavacca, Tübingen University, Tübingen
Leo van Iersel, University of Canterbury, Christchurch
Rooted phylogenetic networks are often used to represent conflicting phylogenetic signals. Given a set of clusters, a network is said to represent these clusters in the softwired sense if, for each cluster in the input set, at least one tree embedded in the network contains that cluster. Motivated by parsimony we might wish to construct such a network using as few reticulations as possible, or minimizing the level of the network, i.e., the maximum number of reticulations used in any "tangled” region of the network. Although these are NP-hard problems, here we prove that, for every fixed k \ge 0, it is polynomial-time solvable to construct a phylogenetic network with level equal to k representing a cluster set, or to determine that no such network exists. However, this algorithm does not lend itself to a practical implementation. We also prove that the comparatively efficient Cass algorithm correctly solves this problem (and also minimizes the reticulation number) when input clusters are obtained from two not necessarily binary gene trees on the same set of taxa but does not always minimize level for general cluster sets. Finally, we describe a new algorithm which generates in polynomial-time all binary phylogenetic networks with exactly r reticulations representing a set of input clusters (for every fixed r \ge 0).

[1] C. Semple and M. Steel, Phylogenetics. Oxford Univ. Press, 2003.
[2] Mathematics of Evolution and Phylogeny, O. Gascuel, ed. Oxford Univ. Press, Inc., http://portal.acm.orgcitation.cfm?id=1557209 , 2005.
[3] Reconstructing Evolution: New Mathematical and Computational Advances, O. Gascuel and M. Steel, eds. Oxford Univ. Press, 0199208220 , 2007.
[4] D.H. Huson, R. Rupp, and C. Scornavacca, Phylogenetic Networks: Concepts, Algorithms and Applications. Cambridge Univ. Press, 2011.
[5] L. Nakhleh, “Evolutionary Phylogenetic Networks: Models and Issues,” The Problem Solving Handbook for Computational Biology and Bioinformatics, Springer, 2009.
[6] C. Semple, “Hybridization Networks,” Reconstructing Evolution - New Mathematical and Computational Advances, Oxford Univ. Press, 2007.
[7] D.H. Huson, R. Rupp, V. Berry, P. Gambette, and C. Paul, “Computing Galled Networks from Real Data,” Bioinformatics, vol. 25, no. 12, pp. i85-i93, 2009.
[8] L.J.J. van Iersel and S.M. Kelk, “When Two Trees Go to War,” J. Theoretical Biology, vol. 269, no. 1, pp. 245-255, 2011.
[9] J. Jansson and W.-K. Sung, “Inferring a Level-1 Phylogenetic Network from a Dense Set of Rooted Triplets,” Theoretical Computer Science, vol. 363, no. 1, pp. 60-68, 2006.
[10] J. Jansson, N.B. Nguyen, and W.-K. Sung, “Algorithms for Combining Rooted Triplets into a Galled Phylogenetic Network,” SIAM J. Computing, vol. 35, no. 5, pp. 1098-1121, 2006.
[11] L.J.J. van Iersel, J.C.M. Keijsper, S.M. Kelk, L. Stougie, F. Hagen, and T. Boekhout, “Constructing Level-2 Phylogenetic Networks from Triplets,” IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 6, no. 4, pp. 667-681, Oct.-Dec. 2009.
[12] L.J.J. van Iersel, S.M. Kelk, and M. Mnich, “Uniqueness, Intractability and Exact Algorithms: Reflections on Level-$k$ Phylogenetic Networks,” J. Bioinformatics and Computational Biology, vol. 7, no. 2, pp. 597-623, 2009.
[13] T.-H. To and M. Habib, “Level-$k$ Phylogenetic Networks Are Constructable from a Dense Triplet Set in Polynomial Time,” Proc. Conf. Combinatorial Pattern Matching (CPM '09), pp. 275-288, 2009.
[14] M. Bordewich and C. Semple, “Computing the Minimum Number of Hybridization Events for a Consistent Evolutionary History,” Discrete Applied Math., vol. 155, no. 8, pp. 914-928, 2007.
[15] M. Bordewich, S. Linz, K.S. John, and C. Semple, “A Reduction Algorithm for Computing the Hybridization Number of Two Trees,” Evolutionary Bioinformatics, vol. 3, pp. 86-98, 2007.
[16] M. Bordewich and C. Semple, “Computing the Hybridization Number of Two Phylogenetic Trees Is Fixed-Parameter Tractable,” IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 4, no. 3, pp. 458-466, July-Sept. 2007.
[17] J. Collins, S. Linz, and C. Semple, “Quantifying Hybridization in Realistic Time,” J. Computational Biology, vol. 18, no. 10, pp 1305-1318, 2011.
[18] Y. Wu and W. Jiayin, “Fast Computation of the Exact Hybridization Number of Two Phylogenetic Trees,” Proc. Sixth Int'l Symp. Bioinformatics Research and Applications (ISBRA), vol. 6053, pp. 203-214, 2010.
[19] Y. Wu, “Close Lower and Upper Bounds for the Minimum Reticulate Network of Multiple Phylogenetic Trees,” Bioinformatics, vol. 26, pp. i140-i148, 2010.
[20] T. Huynh, J. Jansson, N. Nguyen, and W.-K. Sung, “Constructing a Smallest Refining Galled Phylogenetic Network,” Proc. Int'l Conf. Research in Computational Molecular Biology (RECOMB), pp. 265-280, 2005.
[21] L. van Iersel and S. Kelk, “Constructing the Simplest Possible Phylogenetic Network from Triplets,” Algorithmica, pp. 1-29, Springer, , 2009.
[22] D. Gusfield, V. Bansal, V. Bafna, and Y. Song, “A Decomposition Theory for Phylogenetic Networks and Incompatible Characters,” J. Computational Biology, vol. 14, no. 10, pp. 1247-1272, 2007.
[23] D. Gusfield, D. Hickerson, and S. Eddhu, “An Efficiently Computed Lower Bound on the Number of Recombinations in Phylognetic Networks: Theory and Empirical Study,” Discrete Applied Math., vol. 155, nos. 6/7, pp. 806-830, 2007.
[24] Y. Wu and D. Gusfield, “A New Recombination Lower Bound and the Minimum Perfect Phylogenetic Forest Problem,” J. Combinatorial Optimization, vol. 16, no. 3, pp. 229-247, 2008.
[25] S.R. Myers and R.C. Griffiths, “Bounds on the Minimum Number of Recombination Events in a Sample History,” Genetics, vol. 163, pp. 375-394, 2003.
[26] D.H. Huson and T.H. Klöpper, “Beyond Galled Trees—Decomposition and Computation of Galled Networks,” Proc. Int'l Conf. Research in Computational Molecular Biology (RECOMB), pp. 211-225, 2007.
[27] L.J.J. van Iersel, S.M. Kelk, R. Rupp, and D.H. Huson, “Phylogenetic Networks Do Not Need to be Complex: Using Fewer Reticulations to Represent Conflicting Clusters,” Bioinformatics, vol. 26, pp. i124-i131, 2010.
[28] D. Huson and C. Scornavacca, “Dendroscope 3—A Program for Computing and Drawing Rooted Phylogenetic Trees and Networks,” in Preparation. Software Available from: www., 2011.
[29] S. Linz and C. Semple, “Hybridization in Non-Binary Trees,” IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 6, no. 1, pp. 30-45, Jan.-Mar. 2009.
[30] P. Gambette, V. Berry, and C. Paul, “The Structure of Level-k Phylogenetic Networks,” Proc. 20th Ann. Symp. Combinatorial Pattern Matching, pp. 289-300, , 2009.
[31] P. Gambette, “Generators - Building Level-$k$ Generators,” , 2011.
[32] V. Bafna and V. Bansal, “Inference about Recombination from Haplotype Data: Lower Bounds and Recombination Hotspots,” J. Computational Biology, vol. 13, pp. 501-521, 2006.

Index Terms:
Rooted phylogenetic networks, clusters, reticulate evolution, parsimony, computational complexity, polynomial-time algorithms.
Steven M. Kelk, Celine Scornavacca, Leo van Iersel, "On the Elusiveness of Clusters," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 2, pp. 517-534, March-April 2012, doi:10.1109/TCBB.2011.128
Usage of this product signifies your acceptance of the Terms of Use.