The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.01 - January/February (2012 vol.9)
pp: 1-11
D. DeBlasio , Dept. of Electr. Eng. & Comput. Sci., Univ. of Central Florida, Orlando, FL, USA
J. Bruand , Bioinf. & Syst. Biol. Grad. Program, Univ. of California, San Diego, La Jolla, CA, USA
Shaojie Zhang , Dept. of Electr. Eng. & Comput. Sci., Univ. of Central Florida, Orlando, FL, USA
ABSTRACT
Structure-based RNA multiple alignment is particularly challenging because covarying mutations make sequence information alone insufficient. Existing tools for RNA multiple alignment first generate pairwise RNA structure alignments and then build the multiple alignment using only sequence information. Here we present PMFastR, an algorithm which iteratively uses a sequence-structure alignment procedure to build a structure-based RNA multiple alignment from one sequence with known structure and a database of sequences from the same family. PMFastR also has low memory consumption allowing for the alignment of large sequences such as 16S and 23S rRNA. The algorithm also provides a method to utilize a multicore environment. We present results on benchmark data sets from BRAliBase, which shows PMFastR performs comparably to other state-of-the-art programs. Finally, we regenerate 607 Rfam seed alignments and show that our automated process creates multiple alignments similar to the manually curated Rfam seed alignments. Thus, the techniques presented in this paper allow for the generation of multiple alignments using sequence-structure guidance, while limiting memory consumption. As a result, multiple alignments of long RNA sequences, such as 16S and 23S rRNAs, can easily be generated locally on a personal computer. The software and supplementary data are available at http://genome.ucf.edu/PMFastR.
INDEX TERMS
RNA, Databases, Instruction sets, Memory management, Dynamic programming, Arrays, Bioinformatics,iterative alignment., RNA multiple alignment, RNA secondary structure, RNA sequence-structure alignment
CITATION
D. DeBlasio, J. Bruand, Shaojie Zhang, "A Memory Efficient Method for Structure-Based RNA Multiple Alignment", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.9, no. 1, pp. 1-11, January/February 2012, doi:10.1109/TCBB.2011.86
REFERENCES
[1] S. Zhang, I. Borovok, Y. Aharonowitz, R. Sharan, and V. Bafna, “A Sequence-Based Filtering Method for ncRNA Identification and Its Application to Searching for Riboswitch Elements,” Bioinformatics, vol. 22, no. 14, pp. e557-e565, 2006.
[2] S. Eddy and R. Durbin, “RNA Sequence Analysis Using Covariance Models,” Nucleic Acids Research, vol. 22, pp. 2079-2088, 1994.
[3] E. Rivas and S. Eddy, “Noncoding RNA Gene Detection Using Comparative Sequence Analysis,” BMC Bioinformatics, vol. 2, pp. 8-26, 2001.
[4] S. Washietl, I.L. Hofacker, M. Lukasser, A. Hottenhofer, and P.F. Stadler, “Mapping of Conserved RNA Secondary Structures Predicts Thousands of Functional Noncoding RNAs in the Human Genome,” Nature Biotechnology, vol. 23, pp. 1383-1390, Nov. 2005.
[5] C.B. Do, M.S. Mahabhashyam, M. Brudno, and S. Batzoglou, “Probcons: Probabilistic Consistency-Based Multiple Sequence Alignment,” Genome Research, vol. 15, no. 2, pp. 330-340, Feb. 2005.
[6] R. Edgar, “MUSCLE: Multiple Sequence Alignment with High Accuracy and High Throughput,” Nucleic Acids Research, vol. 32, pp. 1792-1797, 2004.
[7] C. Notredame, D.G. Higgins, and J. Heringa, “T-Coffee: A Novel Method for Fast and Accurate Multiple Sequence Alignment,” J. Molecular Biology, vol. 302, no. 1, pp. 205-217, Sept. 2000.
[8] J.D. Thompson, D.G. Higgins, and T.J. Gibson, “CLUSTAL W: Improving the Sensitivity of Progressive Multiple Sequence Alignment through Sequence Weighting, Position-Specific Gap Penalties and Weight Matrix Choice,” Nuclic Acids Research, vol. 22, no. 22, pp. 4673-4680, 1994.
[9] J.A. Jaeger, D.H. Turner, and M. Zuker, “Improved Predictions of Secondary Structures for RNA,” Proc. Nat'l Academy of Sciences USA, vol. 86, no. 20, pp. 7706-7710, 1989.
[10] B. Knudsen and J. Hein, “Pfold: RNA Secondary Structure Prediction Using Stochastic Context-Free Grammars,” Nucleic Acids Research, vol. 31, no. 13, pp. 3423-3428, 2003.
[11] M. Zuker and D. Sankoff, “RNA Secondary Structures and Their Prediction,” Bull. Math. Biology, vol. 46, no. 4, pp. 591-621, 1984.
[12] M. Bauer, G. Klau, and K. Reinert, “Accurate Multiple Sequence-Structure Alignment of RNA Sequences Using Combinatorial Optimization,” BMC Bioinformatics, vol. 8, no. 1,article 271, 2007.
[13] T. Jiang, G. Lin, B. Ma, and K. Zhang, “A General Edit Distance between RNA Structures,” J. Computational Biology, vol. 9, no. 2, pp. 371-388, 2002.
[14] D. Sankoff, “Simulations Solution of the RNA Folding, Alignment and Protosequence Problems,” SIAM J. Applied Math., vol. 45, no. 5, pp. 810-825, 1985.
[15] S. Zhang, B. Haas, E. Eskin, and V. Bafna, “Searching Genomes for Noncoding RNA Using Fastr,” IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 2, no. 4, pp. 366-379, Oct.-Dec. 2005.
[16] Z. Weinberg and W.L. Ruzzo, “Exploiting Conserved Structure for Faster Annotation of Non-Coding RNAs without Loss of Accuracy,” Bioinformatics, vol. 20, no. suppl_1, pp. 334-341, 2004.
[17] R. Klein and S. Eddy, “Rsearch: Finding Homologs of Single Structured RNA Sequences,” BMC Bioinformatics, vol. 4, no. 1,article 44, 2003.
[18] I.L. Hofacker, S.H. Bernhart, and P.F. Stadler, “Alignment of RNA Base Pairing Probability Matrices,” Bioinformatics, vol. 20, pp. 2222-2227, Sept. 2004.
[19] S. Siebert and R. Backofen, “MARNA: Multiple Alignment and Consensus Structure Prediction of RNAs Based on Sequence Structure Comparisons,” Bioinformatics, vol. 21, pp. 3352-3359, Aug. 2005.
[20] I. Holmes, “Accelerated Probabilistic Inference of RNA Structure Evolution,” BMC Bioinformatics, vol. 6, article 73, 2005.
[21] D. Dalli, A. Wilm, I. Mainz, and G. Steger, “STRAL: Progressive Alignment of Non-Coding RNA Using Base Pairing Probability Vectors in Quadratic Time,” Bioinformatics, vol. 22, pp. 1593-1599, July 2006.
[22] M. Hochsmann, B. Voss, and R. Giegerich, “Pure Multiple RNA Secondary Structure Alignments: A Progressive Profile Approach,” IEEE/ACM Trans. Computational Biology Bioinformatics, vol. 1, no. 1, pp. 53-62, Jan.-Mar. 2004.
[23] S. Griffiths-Jones, S. Moxon, M. Marshall, A. Khanna, S.R. Eddy, and A. Bateman, “Rfam: Annotating Non-Coding RNAs in Complete Genomes,” Nucleic Acids Research, vol. 33, no. suppl_1, pp. D121-D124, 2005.
[24] P.P. Gardner, A. Wilm, and S. Washietl, “A Benchmark of Multiple Sequence Alignment Programs upon Structural RNAs,” Nucleic Acids Research, vol. 33, pp. 2433-2439, 2005.
[25] V. Bafna, S. Muthukrishnan, and R. Ravi, “Computing Similarity between RNA Strings,” Proc. Combinatorial Pattern Matching Conf., pp. 1-16, 1995.
[26] V. Bafna and S. Zhang, “FastR: Fast Database Search Tool for Non-Coding RNA,” Proc. IEEE Computational Systems Bioinformatics Conf., pp. 52-61, 2004.
[27] M. Larkin, G. Blackshields, N. Brown, R. Chenna, P. McGettigan, H. McWilliam, F. Valentin, I. Wallace, A. Wilm, R. Lopez, J. Thompson, T. Gibson, and D. Higgins, “Clustal W and Clustal X Version 2.0,” Bioinformatics, vol. 23, no. 21, pp. 2947-2948, 2007.
[28] J. Cannone, S. Subramanian, M. Schnare, J. Collett, L. D'Souza, Y. Du, B. Feng, N. Lin, L. Madabusi, K. Muller, N. Pande, Z. Shang, N. Yu, and R. Gutell, “The Comparative RNA Web (CRW) Site: An Online Database of Comparative Sequence and Structure Information for Ribosomal, Intron, and Other RNAs,” BMC Bioinformatics, vol. 3, no. 1,article 2, 2002.
[29] T.Z. DeSantis, I. Dubosarskiy, S.R. Murray, and G.L. Andersen, “Comprehensive Aligned Sequence Construction for Automated Design of Effective Probes (CASCADE-P) Using 16S rDNA,” Bioinformatics, vol. 19, pp. 1461-1468, Aug. 2003.
[30] T.Z. DeSantis, P. Hugenholtz, N. Larsen, M. Rojas, E.L. Brodie, K. Keller, T. Huber, D. Dalevi, P. Hu, and G.L. Andersen, “Greengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARB,” Applied and Environmental Microbiology , vol. 72, pp. 5069-5072, July 2006.
[31] A. Wilm, I. Mainz, and G. Steger, “An Enhanced RNA Alignment Benchmark for Sequence Alignment Programs,” Algorithms for Molecular Biology, vol. 1, p. 19, 2006.
[32] E. Torarinsson, J.H. Havgaard, and J. Gorodkin, “Multiple Structural Alignment and Clustering of RNA Sequences,” Bioinformatics, vol. 23, pp. 926-932, Apr. 2007.
[33] K. Katoh, K. Kuma, H. Toh, and T. Miyata, “MAFFT Version 5: Improvement in Accuracy of Multiple Sequence Alignment,” Nucleic Acids Research, vol. 33, pp. 511-518, 2005.
[34] J.D. Thompson, F. Plewniak, and O. Poch, “A Comprehensive Comparison of Multiple Sequence Alignment Programs,” Nucleic Acids Research, vol. 27, pp. 2682-2690, July 1999.
[35] I.L. Hofacker, M. Fekete, and P.F. Stadler, “Secondary Structure Prediction for Aligned RNA Sequences,” J. Molecular Biology, vol. 319, pp. 1059-1066, June 2002.
31 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool