The Community for Technology Leaders
RSS Icon
Issue No.03 - May/June (2011 vol.8)
pp: 698-709
Alexander K. Hudek , University of Waterloo, Waterloo
Daniel G. Brown , University of Waterloo, Waterloo
We present a pairwise local aligner, FEAST, which uses two new techniques: a sensitive extension algorithm for identifying homologous subsequences, and a descriptive probabilistic alignment model. We also present a new procedure for training alignment parameters and apply it to the human and mouse genomes, producing a better parameter set for these sequences. Our extension algorithm identifies homologous subsequences by considering all evolutionary histories. It has higher maximum sensitivity than Viterbi extensions, and better balances specificity. We model alignments with several submodels, each with unique statistical properties, describing strongly similar and weakly similar regions of homologous DNA. Training parameters using two submodels produces superior alignments, even when we align with only the parameters from the weaker submodel. Our extension algorithm combined with our new parameter set achieves sensitivity 0.59 on synthetic tests. In contrast, LASTZ with default settings achieves sensitivity 0.35 with the same false positive rate. Using the weak submodel as parameters for LASTZ increases its sensitivity to 0.59 with high error. FEAST is available at
HMM, sequence evolution, local alignment, biology and genetics.
Alexander K. Hudek, Daniel G. Brown, "FEAST: Sensitive Local Alignment with Multiple Rates of Evolution", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.8, no. 3, pp. 698-709, May/June 2011, doi:10.1109/TCBB.2010.76
[1] S.F. Altschul, T.L. Madden, A.A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D.J. Lipman, "Gapped BLAST and PSI-BLAST: A New Generation of Protein Database Search Programs," Nucleic Acids Research, vol. 25, no. 17, pp. 3389-3402, 1997.
[2] B. Ma, J. Tromp, and M. Li, "PatternHunter: Faster and More Sensitive Homology Search," Bioinformatics, vol. 18, no. 3, pp. 440-445, 2002.
[3] M. Li, B. Ma, D. Kisman, and J. Tromp, "PatternHunter II: Highly Sensitive and Fast Homology Search," J. Bioinformatics and Computational Biology, vol. 2, no. 3, pp. 417-439, 2004.
[4] S. Schwartz, W.J. Kent, A. Smit, Z. Zhang, R. Baertsch, R.C. Hardison, D. Haussler, and W. Miller, "Human-Mouse Alignments with BLASTZ," Genome Research, vol. 13, no. 1, pp. 103-107, 2003.
[5] T. Smith and M. Waterman, "Identification of Common Molecular Subsequences," J. Molecular Biology, vol. 147, pp. 195-197, 1981.
[6] R. Durbin, S. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis. Cambridge Univ. Press, 1998.
[7] M. Michael, C. Dieterich, and J. Stoye, "Suboptimal Local Alignments across Multiple Scoring Schemes," Proc. Fourth Int'l Workshop Algorithms in Bioinformatics (WABI), I. Jonassen and J. Kim, eds., pp. 99-110, 2004.
[8] R.S. Harris, "Improved Pairwise Alignment of Genomic DNA," PhD dissertation, Pennsylvania State Univ., 2007.
[9] M. Blanchette, W.J. Kent, C. Riemer, L. Elnitski, A.F.A. Smit, K.M. Roskin, R. Baertsch, K. Rosenbloom, H. Clawson, E.D. Green, D. Haussler, and W. Miller, "Aligning Multiple Genomic Sequences with the Threaded Blockset Aligner," Genome Research, vol. 14, no. 4, pp. 708-715, 2004.
[10] J. Wang, P.D. Keightley, and T. Johnson, "MCALIGN2: Faster, Accurate Global Pairwise Alignment of Non-Coding DNA Sequences Based on Explicit Models of Indel Evolution," BMC Bioinformatics, vol. 7, article no. 292, 2006.
[11] G. Lunter, A. Rocco, N. Mimouni, A. Heger, A. Caldeira, and J. Hein, "Uncertainty in Homology Inferences: Assessing and Improving Genomic Sequence Alignment," Genome Research, vol. 18, no. 2, pp. 298-309, Feb. 2008.
[12] I. Korf, P. Flicek, D. Duan, and M. Brent, "Integrating Genomic Homology into Gene Structure Prediction," Bioinformatics, vol. 17, p. S140-S148, 2001.
[13] A. Arribas-Gil, D. Metzler, and J. Plouhinec, "Statistical Alignment with a Sequence Evolution Model Allowing Rate Heterogeneity Along the Sequence," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 6, no. 2, pp. 281-295, Apr.-June 2009.
[14] R.K. Bradley, A. Roberts, M. Smoot, S. Juvekar, J. Do, C. Dewey, I. Holmes, and L. Pachter, "Fast Statistical Alignment," PLoS Computational Biology, vol. 5, no. 5, p. e1000392, May 2009.
[15] W.J. Kent and A.M. Zahler, "Conservation, Regulation, Synteny, and Introns in a Large-Scale C. briggsae—C. elegans Genomic Alignment," Genome Research, vol. 10, no. 8, pp. 1115-1125, Aug. 2000.
[16] A. Schultz, M. Zhang, T. Leitner, C. Kuiken, B. Korber, B. Morgenstern, and M. Stanke, "A Jumping Profile Hidden Markov Model and Applications to Recombination Sites in HIV and HCV Genomes," BMC Bioinformatics, vol. 7, article no. 265, 2006.
[17] R.A. Cartwright, "Problems and Solutions for Estimating Indel Rates and Length Distributions," Molecular Biology and Evolution, vol. 26, no. 2, pp. 473-480, Feb. 2009.
[18] I. Holmes and R. Durbin, "Dynamic Programming Alignment Accuracy," J. Computational Biology, vol. 5, no. 3, pp. 493-504, 1998.
[19] A.S. Schwartz, "Posterior Decoding Methods for Optimization and Accuracy Control of Multiple Alignments," PhD dissertation, Univ. of California, Berkeley, 2007.
[20] A.S. Schwartz and L. Pachter, "Multiple Alignment by Sequence Annealing," Bioinformatics, vol. 23, no. 2, pp. e24-e29, Jan. 2007.
[21] L. Pachter and B. Sturmfels, "Parametric Inference for Biological Sequence Analysis," Proc. Nat'l Academy of Sciences USA, vol. 101, no. 46, pp. 16138-16143, 2004.
[22] F. Chiaromonte, V.B. Yap, and W. Miller, "Scoring Pairwise Genomic Sequence Alignments," Proc. Pacific Symp. Biocomputing, pp. 115-126, 2002.
[23] S. Henikoff and J. Henikoff, "Amino Acid Substitution Matrices from Protein Blocks," Proc. Nat'l Academy of the Sciences USA, vol. 89, no. 22, pp. 10915-10919, Nov. 1992.
[24] D.J. Lipman and W.R. Pearson, "Rapid and Sensitive Protein Similarity Searches," Science, vol. 227, no. 4693, pp. 1435-1441, Mar. 1985.
[25] Y. Sun and J. Buhler, "Designing Multiple Simultaneous Seeds for DNA Similarity Search," J. Computational Biology, vol. 12, no. 6, pp. 847-861, 2005.
[26] L. Ilie and S. Ilie, "Multiple Spaced Seeds for Homology Search," Bioinformatics, vol. 23, no. 22, pp. 2969-2977, 2007.
[27] J. Xu, D. Brown, M. Li, and B. Ma, "Optimizing Multiple Spaced Seeds for Homology Search," J. Computational Biology, vol. 13, no. 7, pp. 1355-1368, 2006.
[28] A.K. Hudek, "Improvements in the Accuracy of Pairwise Genomic Alignment," PhD dissertation, Univ. of Waterloo, 2010.
[29] "UCSC Genome Bioinformatics: FAQ,", Jan. 2010.
[30] E.S. Lander et al., "Initial Sequencing and Analysis of the Human Genome," Nature, vol. 409, no. 6822, pp. 860-921, Feb. 2001.
[31] Mouse Genome Sequencing Consortium, et al., "Initial Sequencing and Comparative Analysis of the Mouse Genome," Nature, vol. 420, no. 6915, pp. 520-562, Dec. 2002.
[32] "The NCBI C++ Toolkit," /, Jan. 2010.
[33] R Development Core Team, "R: A Language and Environment for Statistical Computing," R Foundation for Statistical Computing, http:/, 2009.
[34] "Gmaj: An Interactive Viewer for Multiple Sequence Alignments,", Jan. 2010.
[35] S.S. Gross and M.R. Brent, "Using Multiple Alignments to Improve Gene Prediction," J. Computational Biology, vol. 13, no. 2, pp. 379-393, Mar. 2006.
[36] M.C. Frith, Y. Park, S.L. Sheetlin, and J.L. Spouge, "The Whole Alignment and Nothing but the Alignment: the Problem of Spurious Alignment Flanks," Nucleic Acids Research, vol. 36, no. 18, pp. 5863-5871, Oct. 2008.
[37] B. Paten, J. Herrero, K. Beal, S. Fitzgerald, and E. Birney, "Enredo and Pecan: Genome-Wide Mammalian Consistency-Based Multiple Alignment with Paralogs," Genome Research, vol. 18, no. 11, pp. 1814-1828, 2008.
[38] C.B. Do, M.S.P. Mahabhashyam, M. Brudno, and S. Batzoglou, "ProbCons: Probabilistic Consistency-Based Multiple Sequence Alignment," Genome Research, vol. 15, pp. 330-340, 2006.
36 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool