This Article 
 Bibliographic References 
 Add to: 
Efficient Parameterized Algorithms for Biopolymer Structure-Sequence Alignment
October-December 2006 (vol. 3 no. 4)
pp. 423-432
Computational alignment of a biopolymer sequence (e.g., an RNA or a protein) to a structure is an effective approach to predict and search for the structure of new sequences. To identify the structure of remote homologs, the structure-sequence alignment has to consider not only sequence similarity, but also spatially conserved conformations caused by residue interactions and, consequently, is computationally intractable. It is difficult to cope with the inefficiency without compromising alignment accuracy, especially for structure search in genomes or large databases. This paper introduces a novel method and a parameterized algorithm for structure-sequence alignment. Both the structure and the sequence are represented as graphs, where, in general, the graph for a biopolymer structure has a naturally small tree width. The algorithm constructs an optimal alignment by finding in the sequence graph the maximum valued subgraph isomorphic to the structure graph. It has the computational time complexity O(k^{t} N^{2}) for the structure of N residues and its tree decomposition of width t. Parameter k, small in nature, is determined by a statistical cutoff for the correspondence between the structure and the sequence. This paper demonstrates a successful application of the algorithm to RNA structure search used for noncoding RNA identification. An application to protein threading is also discussed.

[1] S. Arnborg and A. Proskurowski, “Linear Time Algorithms for NP-Hard Problems Restricted to Partial $k$ -Trees,” Discrete Applied Math., vol. 23, pp. 11-24, 1989.
[2] S. Arnborg, J. Lagergren, and D. Seese, “Easy Problems for Tree-Decomposable Graphs,” J. Algorithms, vol. 12, pp. 308-340, 1991.
[3] J. Bowie, R. Luthy, and D. Eisenberg, “A Method to Identify Protein Sequences that Fold into a Known Three-Dimensional Structure,” Science, vol. 253, pp. 164-170, 1991.
[4] H.L. Bodlaender, “A Linear Time Algorithm for Finding Tree-Decompositions of Small Treewidth,” SIAM J. Computing, vol. 25, pp. 1305-1317, 1996.
[5] S.H. Bryant and S.F. Altschul, “Statistics of Sequence-Structure Threading,” Current Opinion Structural Biology, vol. 5, pp. 236-244, 1995.
[6] M. Brown and C. Wilson, “RNA Pseudoknot Modeling Using Intersections of Stochastic Context Free Grammars with Applications to Database Search,” Proc. Pacific Symp. Biocomputing, pp.109-125, 1995.
[7] L. Cai, R. Malmberg, and Y. Wu, “Stochastic Modeling of Pseudoknot Structures: A Grammatical Approach,” Bioinformatics, vol. 19, pp. i66-i73, 2003.
[8] T. Dandekar, S. Schuster, B. Snel, M. Huynen, and P. Bork, “Pathway Alignment: Application to the Comparative Analysis of Glycolytic Enzymes,” Biochemical J., vol. 1, pp. 115-24, 1999.
[9] A.T. Dandjinou, N. Lévesque, S. Larose, J. Lucier, S.A. Elela, and R.J. Wellinger, “A Phylogenetically Based Secondary Structure for the Yeast Telomerase RNA,” Current Biology, vol. 14, pp. 1148-1158, 2004.
[10] J.A. Doudna, “Structural Genomics of RNA,” Nature Structural Biology, vol. 7, no. 11 supp., pp. 954-956, 2000.
[11] R. Downey and M. Fellows, Parameterized Complexity. Springer, 1999.
[12] R. Durbin, S. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge Univ. Press, 1998.
[13] S.R. Eddy, “Computational Genomics of Non-Coding RNA Genes,” Cell, vol. 109, pp. 137-140, 2002.
[14] S. Eddy and R. Durbin, “RNA Sequence Analysis Using Covariance Models,” Nucleic Acids Research, vol. 22, pp. 2079-2088, 1994.
[15] D. Eppstein, “Subgraph Isomorphism in Planar Graphs and Related Problems,” J. Graph Algorithms and Applications, vol. 3.3, pp. 1-27, 1999.
[16] S.J. Geobel, B. Hsue, T.F. Dombrowski, and P.S. Masters, “Characterization of the RNA Components of a Putative Molecular Switch in the 3' Untranslated Region of the Murine Coronavirus Genome,” J. Virology, vol. 78, pp. 669-682, 2004.
[17] S. Griffiths-Jones, A. Bateman, M. Marshall, A. Khanna, and S.R. Eddy, “Rfam: An RNA Family Database,” Nucleic Acids Research, vol. 31, pp. 439-441, 2003.
[18] R.J. Klein and S.R. Eddy, “RSEARCH: Finding Homologs of Single Structured RNA Sequences,” BMC Bioinformatics, vol. 4, p. 44, 2003.
[19] R.H. Lathrop, “The Protein Threading Problem with Sequence Amino Acid Interaction Preferences Is NP-Complete,” Protein Eng., vol. 7, pp. 1069-1068, 1994.
[20] R.H. Lathrop, R.G. Rogers Jr, J. Bienkowska, B.K.M. Bryant, L.J. Buturovic, C. Gaitatzes, R. Nambudripad, J.V. White, and T.F. Smith, “Analysis and Algorithms for Protein Sequence-Structure Alignment,” Computational Methods in Molecular Biology, S.L.Salzberg, D.B. Searls, and S. Kasif, eds., Elsevier, 1998.
[21] A. Leaver-Fay, B. Kuhlman, and J. Snoeyink, “An Adaptive Dynamic Programming Algorithm for the Side Chain Placement Problem,” Proc. Pacific Symp. Biocomputing 10, pp. 16-27, 2005.
[22] H.-P. Lenhof, K. Reinert, and M. Vingron, “A Polyhedral Approach to RNA Sequence Structure Alignment,” J. Computational Biology, vol. 5, no. 3, pp. 517-530, 1998.
[23] C. Liu, Y. Song, R. Malmberg, and L. Cai, “Profiling and Searching for RNA Pseudoknot Structures in Genomes,” Lecture Notes in Computer Science, vol. 3515, pp. 968-975, 2005.
[24] T.M. Lowe and S.R. Eddy, “tRNAscan-SE: A Program for Improved Detection of Transfer RNA Genes in Genomic Sequence,” Nucleic Acids Research, vol. 25, pp. 955-964, 1997.
[25] S.B. Lyngso and C.N. Pedersen, “RNA Pseudoknot Prediction in Energy-Based Models,” J. Computational Biololgy, vol. 7, no. 3, pp.409-427, 2000.
[26] J. Matousek and R. Thomas, “On the Complexity of Finding Iso- and Other Morphisms for Partial $k$ -Trees,” Discrete Math., vol. 108, pp. 343-364, 1992.
[27] E.M. Marcotte, P. Matteo, H.L. Ng, D.W. Rice, T.O. Yeates, and D. Eisenberg, “Detecting Protein Function and Protein-Protein Interactions from Genome Sequences,” Science, vol. 285, pp. 751-753, 1999.
[28] N. Nameki, B. Felden, J.F. Atkins, R.F. Gesteland, H. Himeno, and A. Muto, “Functional and Structural Analysis of a Pseudoknot Upstream of the Tag-Encoded Sequence in E. coli tmRNA,” J.Molecular Biology, vol. 286, no. 3, pp. 733-744, 1999.
[29] D.D. Pervouchine, “IRIS: Intermolecular RNA Interaction Search,” Genome Informatics, vol. 15, no. 2, pp. 92-101, 2004.
[30] K. Reinert, H.-P. Lenhof, P. Mutzel, K. Mehlhorn, and J.D. Kececioglu, “A Branch-and-Cut Algorithm for Multiple Sequence Alignment,” Proc. First Ann. Int'l Conf. Computational Molecular Biology, pp. 241-250, 1997.
[31] E. Rivas and S.R. Eddy, “Noncoding RNA Gene Detection Using Comparative Sequence Analysis,” BMC Bioinformatics, vol. 2, p. 8, 2001.
[32] N. Robertson and P.D. Seymour, “Graph Minors II. Algorithmic Aspects of Tree-Width,” J. Algorithms, vol. 7, pp. 309-322, 1986.
[33] Y. Song, K. Ellrott, C. Liu, J. Guo, Y. Xu, and L. Cai, “Efficient Protein Threading with Tree Decomposition,” manuscript, 2006.
[34] Y. Song, C. Liu, R. Malmberg, F. Pan, and L. Cai, “Tree Decomposition Based Fast Search of RNA Secondary Structures in Genomes,” Proc. 2005 IEEE Computational Systems Bioinformatics Conf., pp. 223-234, 2005.
[35] Y. Uemura, A. Hasegawa, Y. Kobayashi, and T. Yokomori, “Tree Adjoining Grammars for RNA Structure Prediction,” Theoretical Computer Science, vol. 210, pp. 277-303, 1999.
[36] G. Wang and R.L. Dunbrack Jr., “PISCES: A Protein Sequence Culling Server,” Bioinformatics, vol. 19, pp. 1589-1591, 2003.
[37] D. Xu, M.A. Unseren, Y. Xu, and E.C. Uberbacher, “Sequence-Structure Specificity of a Knowledge Based Energy Function at the Secondary Structure Level,” Bioinformatics, vol. 16, pp. 257-268, 2000.
[38] J. Xu, “Rapid Side-Chain Packing via Tree Decomposition,” Proc. 2005 Int'l Conf. Research in Computational Biology, pp. 423-439, 2005.
[39] J. Xu, F. Jiao, and B. Berger, “A Tree-Decomposition Approach to Protein Structure Prediction,” Proc. 2005 IEEE Computational Systems Bioinformatics Conf., pp. 247-256, 2005.
[40] J. Xu, M. Li, D. Kim, and Y. Xu, “RAPTOR: Optimal Protein Threading by Linear Programming,” J. Bioinformatics and Computational Biology, vol. 1, no. 1, pp. 95-113, 2003.
[41] Y. Xu, Z. Liu, L. Cai, and D. Xu, “Protein Structure Prediction by Protein Threading,” Computational Methods for Protein Structure Prediction and Modeling, Y. Xu, D. Xu and J. Liang, eds., Springer, 2006.
[42] Y. Xu, D. Xu, and E.C. Uberbacher, “An Efficient Computational Method for Globally Optimal Threading,” J. Computational Biology, vol. 5, no. 3, pp. 597-614, 1998.

Index Terms:
Structure-sequence alignment, tree decomposition, parameterized algorithm, dynamic programming, RNA structure homology search, protein threading.
Yinglei Song, Chunmei Liu, Xiuzhen Huang, Russell L. Malmberg, Ying Xu, Liming Cai, "Efficient Parameterized Algorithms for Biopolymer Structure-Sequence Alignment," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 3, no. 4, pp. 423-432, Oct.-Dec. 2006, doi:10.1109/TCBB.2006.52
Usage of this product signifies your acceptance of the Terms of Use.