This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
An Ω(n^2/ log n) Speed-Up of TBR Heuristics for the Gene-Duplication Problem
October-December 2008 (vol. 5 no. 4)
pp. 514-524
Mukul S. Bansal, Iowa State University, Ames
Oliver Eulenstein, Iowa State University, Ames
The gene-duplication problem is to infer a species supertree from gene trees that are confounded by complex histories of gene duplications. This problem is NP-hard and thus requires efficient and effective heuristics. Existing heuristics perform a stepwise search of the tree space, where each step is guided by an exact solution to an instance of a local search problem. We improve on the time complexity of the local search problem by a factor of n2= log n, where n is the size of the resulting species supertree. Typically, several thousand instances of the local search problem are solved throughout a stepwise heuristic search. Hence, our improvement makes the gene-duplication problem much more tractable for large-scale phylogenetic analyses.

[1] R. Guigó, I. Muchnik, and T.F. Smith, “Reconstruction of Ancient Molecular Phylogeny,” Molecular Phylogenetics and Evolution, vol. 6, no. 2, pp. 189-213, 1996.
[2] B. Ma, M. Li, and L. Zhang, “On Reconstructing Species Trees from Gene Trees in Term of Duplications and Losses,” Proc. Second Ann. Int'l Conf. Research in Computational Molecular Biology (RECOMB '98), S. Istrail, P. Pevzner, and M. Waterman, eds., pp. 182-191, 1998.
[3] R.D.M. Page, “GeneTree: Comparing Gene and Species Phylogenies Using Reconciled Trees,” Bioinformatics, vol. 14, no. 9, pp. 819-820, 1998.
[4] J.B. Slowinski, A. Knight, and A.P. Rooney, “Inferring Species Trees from Gene Trees: A Phylogenetic Analysis of the Elapidae (Serpentes) Based on the Amino Acid Sequences of Venom Proteins,” Molecular Phylogenetics and Evolution, vol. 8, no. 3, pp.349-362, 1997.
[5] R.D.M. Page, “Extracting Species Trees from Complex Gene Trees: Reconciled Trees and Vertebrate Phylogeny,” Molecular Phylogenetics and Evolution, vol. 14, no. 1, pp. 89-106, 2000.
[6] R.D.M. Page and J. Cotton, “Vertebrate Phylogenomics: Reconciled Trees and Gene Duplications,” Proc. Seventh Pacific Symp. Biocomputing (PSB '02), R.B. Altman, A.K. Dunker, L. Hunter, K.Lauderdale and T.E. Klein, eds., pp. 536-547, Jan. 2002.
[7] J.A. Cotton and R.D.M. Page, “Tangled Tales from Multiple Markers: Reconciling Conflict between Phylogenies to Build Molecular Supertrees,” Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life, O.R.P. Bininda-Emonds, ed., pp. 107-125, Springer, 2004.
[8] M.J. Sanderson and M.M. McMahon, “Inferring Angiosperm Phylogeny from EST Data with Widespread Gene Duplication,” BMC Evolutionary Biology, vol. 7, supplementary 1:S3, 2007.
[9] D.L. Swofford, G.J. Olsen, P.J. Waddel, and D.M. Hillis, “Phylogenetic Inference,” Molecular Systematics, D.M. Hillis, C.Moritz, and B.K. Mable, eds., chapter 11, pp.407-509, Sinauer Assoc., 1996.
[10] B.L. Allen and M. Steel, “Subtree Transfer Operations and Their Induced Metrics on Evolutionary Trees,” Annals of Combinatorics, vol. 5, pp. 1-13, 2001.
[11] D. Chen, O. Eulenstein, D. Fernández-Baca, and J.G. Burleigh, “Improved Heuristics for Minimum-Flip Supertree Construction,” Evolutionary Bioinformatics, vol. 2, 2006.
[12] M. Goodman, J. Czelusniak, G.W. Moore, A.E. Romero-Herrera, and G. Matsuda, “Fitting the Gene Lineage into Its Species Lineage. A Parsimony Strategy Illustrated by Cladograms Constructed from Globin Sequences,” Systematic Zoology, vol. 28, pp.132-163, 1979.
[13] R.D.M. Page, “Maps between Trees and Cladistic Analysis of Historical Associations among Genes, Organisms, and Areas,” Systematic Biology, vol. 43, no. 1, pp. 58-77, 1994.
[14] B. Mirkin, I. Muchnik, and T.F. Smith, “A Biologically Consistent Model for Comparing Molecular Phylogenies,” J. Computational Biology, vol. 2, no. 4, pp. 493-507, 1995.
[15] O. Eulenstein, “Predictions of Gene-Duplications and Their Phylogenetic Development,” PhD dissertation, Univ. of Bonn, Germany. gMD Research Series No. 20/1998, ISSN:1435-2699, 1998.
[16] L. Zhang, “On a Mirkin-Muchnik-Smith Conjecture for Comparing Molecular Phylogenies,” J. Computational Biology, vol. 4, no. 2, pp. 177-187, 1997.
[17] K. Chen, D. Durand, and M. Farach-Colton, “NOTUNG: A Program for Dating Gene Duplications and Optimizing Gene Family Trees,” J. Computational Biology, vol. 7, no. 3/4, 2000.
[18] P. Bonizzoni, G.D. Vedova, and R. Dondi, “Reconciling a Gene Tree to a Species Tree under the Duplication Cost Model,” Theoretical Computer Science, vol. 347, nos. 1-2, pp. 36-53, 2005.
[19] P. Górecki and J. Tiuryn, “On the Structure of Reconciliations,” Proc. Second RECOMB Comparative Genomics Satellite Workshop, J.Lagergren, ed., pp. 42-54, 2004.
[20] M.A. Bender and M. Farach-Colton, “The LCA Problem Revisited,” Proc. Fourth Latin Am. Symp. Theoretical Informatics (LATIN '00), G.H. Gonnet, D. Panario, and A. Viola, eds., pp. 88-94, 2000.
[21] D. Harel and R.E. Tarjan, “Fast Algorithms for Finding Nearest Common Ancestors,” SIAM J. Computing, vol. 13, no. 2, pp.338-355, 1984.
[22] M.R. Fellows, M.T. Hallett, C. Korostensky, and U. Stege, “Analogs and Duals of the MAST Problem for Sequences and Trees,” Proc. Sixth Ann. European Symp. Algorithms (ESA '98), G.Bilardi, G.F.Italiano, A. Pietracaprina, and G. Pucci, eds., pp.103-114, 1998.
[23] U. Stege, “Gene Trees and Species Trees: The Gene-Duplication Problem Is Fixed-Parameter Tractable,” Proc. Sixth Int'l Workshop Algorithms and Data Structures (WADS '99), F.K.H.A. Dehne, A.Gupta, J.-R. Sack, and R. Tamassia, eds., pp. 288-293, 1999.
[24] M.T. Hallett and J. Lagergren, “New Algorithms for the Duplication-Loss Model,” Proc. Fourth Ann. Int'l Conf. Research in Computational Molecular Biology (RECOMB '00), R. Shamir, S.Miyano, S. Istrail, P. Pevzner, and M. Waterman, eds., pp.138-146, 2000.
[25] M. Bordewich and C. Semple, “On the Computational Complexity of the Rooted Subtree Prune and Regraft Distance,” Annals of Combinatorics, vol. 8, pp. 409-423, 2004.
[26] M.S. Bansal, J.G. Burleigh, O. Eulenstein, and A. Wehe, “Heuristics for the Gene-Duplication Problem: A $\Theta(n)$ Speed-Up for the Local Search,” Proc. 11th Ann. Int'l Conf. Research in Computational Molecular Biology (RECOMB '07), T.P. Speed and H. Huang, eds., pp. 238-252, 2007.
[27] M. de Berg, M. van Kreveld, M. Overmars, and O. Schwarzkopf, Computational Geometry: Algorithms and Applications. Springer, 2000.

Index Terms:
Algorithms, Phylogenetics, Computational Biology, Gene Duplication, Supertrees
Citation:
Mukul S. Bansal, Oliver Eulenstein, "An Ω(n^2/ log n) Speed-Up of TBR Heuristics for the Gene-Duplication Problem," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 5, no. 4, pp. 514-524, Oct.-Dec. 2008, doi:10.1109/TCBB.2008.69
Usage of this product signifies your acceptance of the Terms of Use.