The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.04 - October-December (2009 vol.6)
pp: 570-582
Changjin Hong , University of Minnesota, Minneapolis
Ahmed H. Tewfik , University of Minnesota, Minneapolis
ABSTRACT
Recomputation of the previously evaluated similarity results between biological sequences becomes inevitable when researchers realize errors in their sequenced data or when the researchers have to compare nearly similar sequences, e.g., in a family of proteins. We present an efficient scheme for updating local sequence alignments with an affine gap model. In principle, using the previous matching result between two amino acid sequences, we perform a forward-backward alignment to generate heuristic searching bands which are bounded by a set of suboptimal paths. Given a correctly updated sequence, we initially predict a new score of the alignment path for each contour to select the best candidates among them. Then, we run the Smith-Waterman algorithm in this confined space. Furthermore, our heuristic alignment for an updated sequence shows that it can be further accelerated by using reusable dynamic programming (rDP), our prior work. In this study, we successfully validate "relative node tolerance bound” (RNTB) in the pruned searching space. Furthermore, we improve the computational performance by quantifying the successful RNTB tolerance probability and switch to rDP on perturbation-resilient columns only. In our searching space derived by a threshold value of 90 percent of the optimal alignment score, we find that 98.3 percent of contours contain correctly updated paths. We also find that our method consumes only 25.36 percent of the runtime cost of sparse dynamic programming (sDP) method, and to only 2.55 percent of that of a normal dynamic programming with the Smith-Waterman algorithm.
INDEX TERMS
Shortest path, minimum spanning tree, sensitivity analysis, dynamic programming, sequence alignment, string edit, suboptimal paths.
CITATION
Changjin Hong, Ahmed H. Tewfik, "Heuristic Reusable Dynamic Programming: Efficient Updates of Local Sequence Alignment", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.6, no. 4, pp. 570-582, October-December 2009, doi:10.1109/TCBB.2009.30
REFERENCES
[1] C. Chothia and A.M. Lesk, “The Relation between the Divergence of Sequence and Structure in Proteins,” EMBO J., vol. 5, no. 4, pp. 823-826, 1986.
[2] D. Gusfield, Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge Univ. Press, 1997.
[3] A.G. Clark and T.S. Whittam, “Sequencing Errors and Molecular Evolutionary Analysis,” Molecular Biology and Evolution, vol. 9, no. 4, pp. 744-752, 1992.
[4] T. Kristensen, R. Lopez, and H. Prydz, “An Estimate of the Sequencing Error Frequency in the DNA Sequence Databases,” DNA Sequencing, vol. 2, no. 6, pp. 343-346, 1992.
[5] T.D. Read, S.L. Salzberg, M. Pop, M. Shumway, L. Umayam, L. Jiang, E. Holtzapple, J.D. Busch, K.L. Smith, J.M. Schupp, D. Solomon, P. Keim, and C.M. Fraser, “Comparative Genome Sequencing for Discovery of Novel Polymorphisms,” Science, vol. 296, no. 5575, pp. 2028-2033, 2002.
[6] S. Ho, M.J. Phillips, A. Cooper, and A.J. Drummond, “Time Dependency of Molecular Rate Estimates and Systematic Overestimation of Recent Divergence Times,” Molecular Biology and Evolution, vol. 22, no. 7, pp. 1561-1568, 2005.
[7] M. Nei and T. Gojobori, “Simple Methods for Estimating the Numbers of Synonymous and Nonsynonymous Nucleotide Substitution,” Molecular Biology Evolution, vol. 3, no. 111, pp. 418-426, 1986.
[8] P.E. Hart, N.J. Nilson, and B. Raphael, “A Formal Basis for the Heuristic Determination of Minimum Cost Paths,” IEEE Trans. Systems Science and Cybernetics, vol. 4, no. 2, pp. 100-107, July 1968.
[9] D. Eppstein, “Finding the k Smallest Spanning Trees,” Lecture Notes in Computer Science, vol. 447, pp. 38-47, 1990.
[10] S.F. Altschul, T.L. Madden, A.A. Shaffer, J. Zhang, Z. Zhang, W. Miller, and D.J. Lipman, “Gapped Blast and Psi-Blast: A New Generation of Protein Database Search Programs,” Nucleic Acids Research, vol. 25, no. 17, pp. 3389-3402, 1997.
[11] C. Hong and A.H. Tewfik, “Reusable Viterbi Algorithm: Efficient Updates of Decoding for a Biological Sequence Analysis,” IEEE J. Selected Topics in Signal Processing, vol. 2, no. 3, pp. 365-377, June 2008.
[12] C. Hong and A. Tewfik, “Handling Updates of a Pairwise Sequence Alignment,” Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing (ICASSP '06), vol. 2, 2006.
[13] D. Naor and D.L. Brutlag, “On Near-Optimal Alignments of Biological Sequences,” J. Computational Biology, vol. 1, no. 4, pp. 349-366, 1994.
[14] M. Vingron, “Near-Optimal Sequence Alignment,” Current Opinion in Structural Biology, vol. 6, no. 3, pp. 346-352, 1996.
[15] L.R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Application in Speech Recognition,” Proc. IEEE, vol. 77, no. 2, pp. 257-286, Feb. 1989.
[16] M. Zuker, “Suboptimal Sequence Alignment in Molecular Biology: Alignment with Error Analysis,” J. Molecular Biology, vol. 221, no. 111, pp. 403-420, 1991.
[17] J.-M. Chandonia, G. Hon, N.S. Walker, L. Lo Conte, P. Koehl, M. Levitt, and S.E. Brenner, “The Astral Compendium in 2004,” Nucleic Acids Research, vol. 32, no. 111, pp. D189-D192, Jan. 2004.
[18] D.W. Mount, Bioinformatics: Sequence and Genome Analysis. Cold Spring Harbor Laboratory Press, 2001.
[19] T.F. Smith and M.S. Waterman, “Identification of Common Molecular Subsequences,” J. Molecular Biology, vol. 147, no. 111, pp. 195-197, 1981.
[20] O. Gotoh, “An Improved Algorithm for Matching Biological Sequences,” J. Molecular Biology, vol. 162, no. 3, pp. 705-708, 1982.
[21] D.P. Bertsekas, Dynamic Programming and Optimal Control. Athena Scientific, 2000.
[22] T.H. Cormen, C.E. Leiserson, R.L. Rivest, and C. Stein, Introduction to Algorithms. MIT Press, 1992.
[23] G. Raghava and G.J. Barton, “Quantification of the Variation in Percentage Identity for Protein Sequence Alignments,” BMC Bioinformatics, vol. 111, no. 7,article 415, Sept. 2006, http://dx.doi.org/10.11861471-2105-7-415 .
[24] D.S. Hirschberg, “Algorithms for the Longest Common Subsequence Problem,” J. ACM, vol. 24, no. 4, pp. 664-675, Oct. 1977.
[25] W.R. Gilks, B. Audit, D. de Angelis, S. Tsoka, and C.A. Ouzounis, “Percolation of Annotation Errors through Hierarchically Structured Protein Sequence Databases,” Math. Biosciences, vol. 193, no. 2, pp. 223-234, Feb. 2005.
[26] N.J. Edwards, “Novel Peptide Identification from Tandem Mass Spectra Using ests and Sequence Database Compression,” Molecular Systems Biology, vol. 3, article 102, 2007, http://dx.doi.org/10.1038msb4100142.
[27] B. Ewing and P. Green, “Base-Calling of Automated Sequencer Traces Using Phred. II. Error Probabilities,” Genome Research, vol. 8, no. 3, pp. 186-194, Mar. 1998.
20 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool