Subscribe
Issue No.01 - January-March (2009 vol.6)
pp: 7-21
Kevin Liu , The University of Texas at Austin, Austin
Serita Nelesen , The University of Texas at Austin, Austin
Sindhu Raghavan , The University of Texas at Austin, Austin
C. Randal Linder , The University of Texas at Austin, Austin
Tandy Warnow , The University of Texas at Austin, Austin
ABSTRACT
Several methods have been developed for simultaneous estimation of alignment and tree, of which POY is the most popular. In a 2007 paper published in Systematic Biology, Ogden and Rosenberg reported on a simulation study in which they compared POY to estimating the alignment using ClustalW and then analyzing the resultant alignment using maximum parsimony. They found that ClustalW+MP outperformed POY with respect to alignment and phylogenetic tree accuracy, and they concluded that simultaneous estimation techniques are not competitive with two-phase techniques. Our paper presents a simulation study in which we focus on the NP-hard optimization problem that POY addresses: minimizing treelength. Our study considers the impact of the gap penalty and suggests that the poor performance observed for POY by Ogden and Rosenberg is due to the simple gap penalties they used to score alignment/tree pairs. Our study suggests that optimizing under an affine gap penalty might produce alignments that are better than ClustalW alignments, and competitive with those produced by the best current alignment methods. We also show that optimizing under this affine gap penalty produces trees whose topological accuracy is better than ClustalW+MP, and competitive with the current best two-phase methods.
INDEX TERMS
Markov processes, Biology and genetics
CITATION
Kevin Liu, Serita Nelesen, Sindhu Raghavan, C. Randal Linder, Tandy Warnow, "Barking Up The Wrong Treelength: The Impact of Gap Penalty on Alignment and Tree Accuracy", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.6, no. 1, pp. 7-21, January-March 2009, doi:10.1109/TCBB.2008.63
REFERENCES
 [1] J. Thompson, D. Higgins, and T. Gibson, “CLUSTAL W: Improving the Sensitivity of Progressive Multiple Sequence Alignment through Sequence Weighting, Position-Specific Gap Penalties and Weight Matrix Choice,” Nucleic Acids Research, vol. 22, pp. 4673-4680, 1994. [2] W.C. Wheeler, “Sequence Alignment, Parameter Sensitivity, and the Phylogenetic Analysis of Molecular Data,” Systematic Biology, vol. 44, pp. 321-331, 1995. [3] J. Lake, “The Order of Sequence Alignment Can Bias the Selection of Tree Topology,” Molecular Biology and Evolution, vol. 8, pp. 378-385, 1991. [4] J. Ellis and D.A. Morrison, “Effects of Sequence Alignment on the Phylogeny of Sarcocystis Deduced from 18S rDNA Sequences,” Parasitology Research, vol. 81, pp. 696-699, 1995. [5] D. Morrison and J.T. Ellis, “Effects of Nucleotide Sequence Alignment on Phylogeny Estimation: A Case Study of 18S rDNAs of Apicomplexa,” Molecular Biology and Evolution, vol. 14, pp. 428-441, 1997. [6] T. Ogden and M. Rosenberg, “Multiple Sequence Alignment Accuracy and Phylogenetic Inference,” Systematic Biology, vol. 55, pp. 314-328, 2006. [7] S. Kumar and A. Filipski, “Multiple Sequence Alignment: In Pursuit of Homologous DNA Positions,” Genome Research, vol. 17, pp. 127-135, 2007. [8] K.M. Wong, M.P. Suchard, and J.P. Huelsenbeck, “Alignment Uncertainty and Genomic Analysis,” Science, vol. 319, no. 5862, pp. 473-476, Jan. 2008. [9] S. Nelesen, K. Liu, D. Zhao, C.R. Linder, and T. Warnow, “The Effect of the Guide Tree on Multiple Sequence Alignments and Subsequent Phylogenetic Analyses,” Proc. Pacific Symp. Biocomputing (PSB '08), vol. 13, pp. 15-24, 2008. [10] J.P. Huelsenbeck and M.A. Suchard, “A Nonparametric Method for Accommodating and Testing Across-Site Rate Variation,” Systematic Biology, vol. 56, no. 6, pp. 975-987, Dec. 2007. [11] U. Roshan, D.R. Livesay, and S. Chikkagouda, “Improving Progressive Alignment for Phylogeny Reconstruction Using Parsimonious Guide-Trees,” Proc. IEEE Sixth Symp. Bioinformatics and Bioengineering (BIBE), 2006. [12] W. Wheeler, “Optimization Alignment: The End of Multiple Sequence Alignment in Phylogenetics,” Cladistics, vol. 12, pp. 1-9, 1996. [13] A. Varon, L. Vinh, I. Bomash, and W. Wheeler, POY Software, documentation by A. Varon, L.S. Vinh, I. Bomash, W. Wheeler, K.Pickett, I. Temkin, J. Faivovich, T. Grant, and W.L. Smith, http://research.amnh.org/scicomp/projects poy.php, 2007. [14] B.D. Redelings and M.A. Suchard, “Joint Bayesian Estimation of Alignment and Phylogeny,” Systematic Biology, vol. 54, no. 3, pp.401-418, 2005. [15] G. Lunter, I. Miklós, A. Drummond, J.L. Jensen, and J. Hein, “Bayesian Coestimation of Phylogeny and Sequence Alignment,” BMC Bioinformatics, vol. 6, p. 83, 2005. [16] R. Fleissner, D. Metzler, and A. von Haeseler, “Simultaneous Statistical Multiple Alignment and Phylogeny Reconstruction,” Systematic Biology, vol. 54, pp. 548-561, 2005. [17] T.H. Ogden and M. Rosenberg, “Alignment and Topological Accuracy of the Direct Optimization Approach via POY and Traditional Phylogenetics via ${\rm ClustalW} + {\rm PAUP}^{\ast}$ ,” Systematic Biology, vol. 56, no. 2, pp. 182-193, 2007. [18] D. Sankoff, “Minimal Mutation Trees of Sequences,” SIAM J. Applied Math., vol. 28, no. 1, pp. 35-42, Jan. 1975. [19] B. Knudsen, “Optimal Multiple Parsimony Alignment with Affine Gap Cost Using a Phylogenetic Tree,” Proc. Third Int'l Workshop Algorithms in Bioinformatics (WABI '03), G. Benson and R. Page, eds., pp. 433-446, 2003. [20] J. Fredslund, J. Hein, and T. Scharling, “A Large Version of the Small Parsimony Problem,” Proc. Third Int'l Workshop Algorithms in Bioinformatics (WABI '03), G. Benson and R. Page, eds., pp.417-432, 2003. [21] L. Foulds and R. Graham, “The Steiner Problem in Phylogeny is NP-Complete,” Advances in Applied Math., vol. 3, pp. 43-49, 1982. [22] L. Wang and T. Jiang, “On the Complexity of Multiple Sequence Alignment,” J. Computational Biology, vol. 1, no. 4, pp. 337-348, 1994. [23] L. Wang and D. Gusfield, “Improved Approximation Algorithms for Tree Alignment,” J. Algorithms, vol. 25, pp. 255-273, 1997. [24] L. Wang, T. Jiang, and D. Gusfield, “A More Efficient Approximation Scheme for Tree Alignment,” SIAM J. Computing, vol. 30, no. 1, pp. 283-299, 2000. [25] J. Felsenstein, Inferring Phylogenies. Sinauer Assoc., Inc., 2004. [26] T.H. Ogden and M.S. Rosenberg, “Multiple Sequence Alignment Accuracy and Phylogenetic Inference,” Systematic Biology, vol. 55, no. 2, pp. 314-328, 2006. [27] M. Hasegawa, K. Kishino, and T. Yano, “Dating the Human-Ape Splitting by a Molecular Clock of Mitochondrial DNA,” J.Molecular Evolution, vol. 22, pp. 160-174, 1985. [28] M.S. Rosenberg, “MySSP: Non-Stationary Evolutionary Sequence Simulation, Including Indels,” Evolutionary Bioinformatics Online, vol. 1, pp. 81-83, 2005. [29] M.J. Sanderson, “r8s: Inferring Absolute Rates of Molecular Evolution and Divergence Times in the Absence of a Molecular Clock,” Bioinformatics, vol. 19, no. 2, pp. 301-302, 2003. [30] B. Moret, U. Roshan, and T. Warnow, “Sequence Length Requirements for Phylogenetic Methods,” Proc. Second Int'l Workshop Algorithms in Bioinformatics (WABI '02), pp. 343-356, 2002. [31] G. Ganapathy, “Algorithms and Heuristics for Combinatorial Optimization in Phylogeny,” PhD dissertation, The Univ. of Texas at Austin, Aug. 2006. [32] The Nematode Branch of the Assembling the Tree of Life Project: NemATOL, http:/nematol.unh.edu, 2008. [33] J. Stoye, D. Evers, and F. Meyer, “Rose: Generating Sequence Families,” Bioinformatics, vol. 14, no. 2, pp. 157-163, 1998. [34] TreeBASE: A Database of Phylogenetic Knowledge, http://www. treebase.org/treebaseindex.html , 2008. [35] K. Katoh, K. Kuma, H. Toh, and T. Miyata, “MAFFT Version 5: Improvement in Accuracy of Multiple Sequence Alignment,” Nucleic Acids Research, vol. 33, no. 2, pp. 511-518, 2005. [36] C. Do, M. Mahabhashyam, M. Brudno, and S. Batzoglou, “PROBCONS: Probabilistic Consistency-Based Multiple Sequence Alignment,” Genome Research, vol. 15, pp. 330-340, 2005. [37] A. Stamatakis, “RAxML-VI-HPC: Maximum Likelihood-Based Phylogenetic Analyses with Thousands of Taxa and Mixed Models,” Bioinformatics, vol. 22, no. 21, pp. 2688-2690, 2006. [38] K.C. Nixon, “The Parsimony Ratchet, a New Method for Rapid Parsimony Analysis,” Cladistics, vol. 15, pp. 407-414, 1999. [39] D. Swofford, PAUP$^{\ast}$ : Phylogenetic Analysis Using Parsimony (and Other Methods), Version 4.0, 1996. [40] O.R.P. Bininda-Emonds, Perl MP Ratchet, http://www.personal. uni-jena.de/~b6biol2/ goto.php?w=/ProgramsperlRat.zip, 2008. [41] Bioperl, http:/www.bioperl.org, 2008. [42] J. Felsenstein, PHYLIP (Phylogeny Inference Package) Version 3.65, distributed by the author, Dept. of Genome Sciences, Univ. of Washington. [43] D. Robinson and L. Foulds, “Comparison of Phylogenetic Trees,” Math. Biosciences, vol. 53, pp. 131-147, 1981. [44] B. Rannala, J.P. Huelsenbeck, Z. Yang, and R. Nielsen, “Taxon Sampling and the Accuracy of Large Phylogenies,” Systematic Biology, vol. 47, pp. 702-710, 1998. [45] G. Lancia and R. Ravi, “GESTALT: Genomic Steiner Alignments,” Proc. 10th Ann. Symp. Combinatorial Pattern Matching (CPM '99), pp. 101-114, http://citeseer.ist.psu.edu325042. html, 1999. [46] G. Lancia and R. Ravi, SALSA: Sequence ALignment via Steiner Ancestors, http://citeseer.ist.psu.edu356333.html, 2008. [47] R.A. Cartwright, “Logarithmic Gap Costs Decrease Alignment Accuracy,” BMC Bioinformatics, vol. 7, no. 527, 2006.