The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.02 - April-June (2009 vol.6)
pp: 221-231
Mukul S. Bansal , Iowa State University, Ames
Oliver Eulenstein , Iowa State University, Ames
André Wehe , Iowa State University, Ames
ABSTRACT
The gene-duplication problem is to infer a species supertree from a collection of gene trees that are confounded by complex histories of gene-duplication events. This problem is NP-complete and thus requires efficient and effective heuristics. Existing heuristics perform a stepwise search of the tree space, where each step is guided by an exact solution to an instance of a local search problem. A classical local search problem is the {\tt NNI} search problem, which is based on the nearest neighbor interchange operation. In this work, we 1) provide a novel near-linear time algorithm for the {\tt NNI} search problem, 2) introduce extensions that significantly enlarge the search space of the {\tt NNI} search problem, and 3) present algorithms for these extended versions that are asymptotically just as efficient as our algorithm for the {\tt NNI} search problem. The exceptional speedup achieved in the extended {\tt NNI} search problems makes the gene-duplication problem more tractable for large-scale phylogenetic analyses. We verify the performance of our algorithms in a comparison study using sets of large randomly generated gene trees.
INDEX TERMS
Computational phylogenetics, gene-duplication, supertrees, local search, {\tt NNI}.
CITATION
Mukul S. Bansal, Oliver Eulenstein, André Wehe, "The Gene-Duplication Problem: Near-Linear Time Algorithms for NNI-Based Local Searches", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.6, no. 2, pp. 221-231, April-June 2009, doi:10.1109/TCBB.2009.7
REFERENCES
[1] M.S. Bansal and O. Eulenstein, “The Gene-Duplication Problem: Near-Linear Time Algorithms for NNI-Based Local Searches,” Proc. Int'l Symp. Bioinformatics Research and Applications (ISBRA), pp.14-25, May 2008.
[2] M. Goodman, J. Czelusniak, G.W. Moore, A.E. Romero-Herrera, and G. Matsuda, “Fitting the Gene Lineage into Its Species Lineage: A Parsimony Strategy Illustrated by Cladograms Constructed from Globin Sequences,” Systematic Zoology, vol. 28, pp.132-163, 1979.
[3] R. Guigó, I. Muchnik, and T.F. Smith, “Reconstruction of Ancient Molecular Phylogeny,” Molecular Phylogenetics and Evolution, vol. 6, no. 2, pp.189-213, 1996.
[4] I. Wapinski, A. Pfeffer, N. Friedman, and A. Regev, “Automatic Genome-Wide Reconstruction of Phylogenetic Gene Trees,” Proc. Int'l Conf. Intelligent Systems for Molecular Biology/European Conf. Computational Biology (ISMB/ECCB) (Supplement of Bioinformatics), pp.549-558, July 2007.
[5] I. Wapinski, A. Pfeffer, N. Friedman, and A. Regev, “Natural History and Evolutionary Principles of Gene Duplication in Fungi,” Nature, vol. 449, pp.54-61, 2007.
[6] L. Arvestad, A.-C. Berglund, J. Lagergren, and B. Sennblad, “Bayesian Gene/Species Tree Reconciliation and Orthology Analysis Using MCMC,” Proc. Int'l Conf. Intelligent Systems for Molecular Biology (ISMB) (Supplement of Bioinformatics), pp.7-15, 2003.
[7] L. Arvestad, A.-C. Berglund, J. Lagergren, and B. Sennblad , “Gene Tree Reconstruction and Orthology Analysis Based on an Integrated Model for Duplications and Sequence Evolution,” Proc. Ann. Int'l Conf. Research in Computational Molecular Biology (RECOMB), pp.326-335, 2004.
[8] B. Ma, M. Li, and L. Zhang, “From Gene Trees to Species Trees,” SIAM J. Computing, vol. 30, no. 3, pp.729-752, 2000.
[9] R.D.M. Page, “GeneTree: Comparing Gene and Species Phylogenies Using Reconciled Trees,” Bioinformatics, vol. 14, no. 9, pp.819-820, 1998.
[10] J. Felsenstein, Inferring Phylogenies, pp. 37-46. Sinauer Assoc., 2004.
[11] C. Semple and M. Steel, Phylogenetics, pp. 30-33. Oxford Univ. Press, 2003.
[12] B.L. Allen and M. Steel, “Subtree Transfer Operations and Their Induced Metrics on Evolutionary Trees,” Annals of Combinatorics, vol. 5, pp.1-13, 2001.
[13] M. Bordewich and C. Semple, “On the Computational Complexity of the Rooted Subtree Prune and Regraft Distance,” Annals of Combinatorics, vol. 8, pp.409-423, 2004.
[14] R.D.M. Page and M.A. Charleston, “From Gene to Organismal Phylogeny: Reconciled Trees and the Gene Tree/Species Tree Problem,” Molecular Phylogenetics and Evolution, vol. 7, pp.231-240, 1997.
[15] D.L. Swofford, G.J. Olsen, P.J. Waddel, and D.M. Hillis, “Phylogenetic Inference,” Molecular Systematics, D.M. Hillis, C.Moritz, and B.K. Mable, eds., chapter 11, pp.407-509, Sinauer Assoc., 1996.
[16] D. Chen, O. Eulenstein, D. Fernández-Baca, and J.G. Burleigh, “Improved Heuristics for Minimum-Flip Supertree Construction,” Evolutionary Bioinformatics, vol. 2, pp.347-356, 2006.
[17] M.S. Bansal, J.G. Burleigh, O. Eulenstein, and A. Wehe, “Heuristics for the Gene-Duplication Problem: A $\Theta (n)$ Speedup for the Local Search,” Proc. Ann. Int'l Conf. Research in Computational Molecular Biology (RECOMB), pp.238-252, 2007.
[18] M.S. Bansal and O. Eulenstein, “An $\Omega (n^2/ \log n)$ Speedup of TBR Heuristics for the Gene-Duplication Problem,” IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 5, no. 4, pp.514-524, Oct.-Dec. 2008.
[19] G. Ganapathy, V. Ramachandran, and T. Warnow, “Better Hill-Climbing Searches for Parsimony,” Proc. Workshop Algorithms in Bioinformatics (WABI), pp.245-258, 2003.
[20] G. Ganapathy, V. Ramachandran, and T. Warnow, “On Contract-and-Refine Transformations between Phylogenetic Trees,” Proc. Symp. Discrete Algorithms (SODA), pp.900-909, 2004.
[21] R.D.M. Page, “Maps between Trees and Cladistic Analysis of Historical Associations Among Genes, Organisms, and Areas,” Systematic Biology, vol. 43, no. 1, pp.58-77, 1994.
[22] B. Mirkin, I. Muchnik, and T.F. Smith, “A Biologically Consistent Model for Comparing Molecular Phylogenies,” J. Computational Biology, vol. 2, no. 4, pp.493-507, 1995.
[23] O. Eulenstein, “Predictions of Gene-Duplications and Their Phylogenetic Development,” PhD dissertation, Univ. of Bonn, GMD Research Series no. 20/1998, 1998.
[24] L. Zhang, “On a Mirkin-Muchnik-Smith Conjecture for Comparing Molecular Phylogenies,” J. Computational Biology, vol. 4, no. 2, pp.177-187, 1997.
[25] K. Chen, D. Durand, and M. Farach-Colton, “Notung: A Program for Dating Gene Duplications and Optimizing Gene Family Trees,” J. Computational Biology, vol. 7, pp.429-447, 2000.
[26] P. Bonizzoni, G.D. Vedova, and R. Dondi, “Reconciling a Gene Tree to a Species Tree under the Duplication Cost Model,” Theoretical Computer Science, vol. 347, nos.1/2, pp.36-53, 2005.
[27] P. Górecki and J. Tiuryn, “On the Structure of Reconciliations,” Proc. Research in Computational Molecular Biology (RECOMB) Comparative Genomics Workshop, pp. 42-52, Oct. 2004.
[28] M.A. Bender and M. Farach-Colton, “The LCA Problem Revisited,” Proc. Latin Am. Theoretical Informatics Symp. (LATIN), pp.88-94, 2000.
[29] J.B. Slowinski, A. Knight, and A.P. Rooney, “Inferring Species Trees from Gene Trees: A Phylogenetic Analysis of the Elapidae (Serpentes) Based on the Amino Acid Sequences of Venom Proteins,” Molecular Phylogenetics and Evolution, vol. 8, pp.349-362, 1997.
[30] R.D.M. Page, “Extracting Species Trees from Complex Gene Trees: Reconciled Trees and Vertebrate Phylogeny,” Molecular Phylogenetics and Evolution, vol. 14, pp.89-106, 2000.
[31] R.D.M. Page and J. Cotton, “Vertebrate Phylogenomics: Reconciled Trees and Gene Duplications,” Proc. Pacific Symp. Biocomputing, pp.536-547, 2002.
[32] J.A. Cotton and R.D.M. Page, “Tangled Tales from Multiple Markers: Reconciling Conflict between Phylogenies to Build Molecular Supertrees,” Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life, pp.107-125, Springer-Verlag, 2004.
[33] M.J. Sanderson and M.M. McMahon, “Inferring Angiosperm Phylogeny from EST Data with Widespread Gene Duplication,” BMC Evolutionary Biology, vol. 7, (Suppl. 1): S3, 2007.
[34] M. Fellows, M. Hallett, C. Korostensky, and U. Stege, “Analogs & Duals of the MAST Problem for Sequences & Trees,” Proc. European Symp. Algorithms (ESA), pp.103-114, 1998.
[35] U. Stege, “Gene Trees and Species Trees: The Gene-Duplication Problem is Fixed-Parameter Tractable,” Proc. Algorithms and Data Structures Symp. (WADS), pp.288-293, 1999.
[36] M.T. Hallett and J. Lagergren, “New Algorithms for the Duplication-Loss Model,” Proc. Ann. Int'l Conf. Research in Computational Molecular Biology (RECOMB), pp.138-146, 2000.
[37] B. DasGupta, X. He, T. Jiang, M. Li, J. Tromp, and L. Zhang, “On Distances between Phylogenetic Trees,” Proc. Symp. Discrete Algorithms (SODA), pp.427-436, 1997.
[38] A. Wehe, M.S. Bansal, J.G. Burleigh, and O. Eulenstein, “DupTree: A Program for Large-Scale Phylogenetic Analyses Using Gene Tree Parsimony,” Bioinformatics, vol. 24, no. 13, pp.1540-1541, 2008.
6 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool