The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.02 - March-April (2013 vol.10)
pp: 274-285
Jikai Lei , Dept. of Comput. Sci. & Eng., Michigan State Univ., East Lansing, MI, USA
Prapaporn Techa-angkoon , Dept. of Comput. Sci. & Eng., Michigan State Univ., East Lansing, MI, USA
Yanni Sun , Dept. of Comput. Sci. & Eng., Michigan State Univ., East Lansing, MI, USA
ABSTRACT
Noncoding RNA (ncRNA) identification is highly important to modern biology. The state-of-the-art method for ncRNA identification is based on comparative genomics, in which evolutionary conservations of sequences and secondary structures provide important evidence for ncRNA search. For ncRNAs with low sequence conservation but high structural similarity, conventional local alignment tools such as BLAST yield low sensitivity. Thus, there is a need for ncRNA search methods that can incorporate both sequence and structural similarities. We introduce chain-RNA, a pairwise structural alignment tool that can effectively locate cross-species conserved RNA elements with low sequence similarity. In chain-RNA, stem-loop structures are extracted from dot plots generated by an efficient local-folding algorithm. Then, we formulate stem alignment as an extended 2D chain problem and employ existing chain algorithms. Chain-RNA is tested on a data set containing annotated ncRNA homologs and is applied to novel ncRNA search in a transcriptomic data set. The experimental results show that chain-RNA has better tradeoff between sensitivity and false positive rate in ncRNA prediction than conventional sequence similarity search tools and is more time efficient than structural alignment tools. The source codes of chain-RNA can be downloaded at http://sourceforge.net/projects/chain-rna/ or at http://www.cse.msu.edu/~leijikai/chain-rna/.
INDEX TERMS
RNA, Chain algorithms,chain algorithm, Noncoding RNA search, secondary structures, structural alignment
CITATION
Jikai Lei, Prapaporn Techa-angkoon, Yanni Sun, "Chain-RNA: A Comparative ncRNA Search Tool Based on the Two-Dimensional Chain Algorithm", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.10, no. 2, pp. 274-285, March-April 2013, doi:10.1109/TCBB.2012.137
REFERENCES
[1] M.W.W. Jones-Rhoades, D.P.P. Bartel, and B. Bartel, "MicroRNAs and Their Regulatory Roles in Plants," Ann. Rev. Plant Biology, vol. 57, pp. 19-53, 2006.
[2] S. Lu, R. Shi, C.-C. Tsao, X. Yi, L. Li, and V.L. Chiang, "RNA Silencing in Plants by the Expression of siRNA Duplexes," Nucleic Acids Research, vol. 32, no. 21, p. e171, 2004.
[3] E.P. Nawrocki, D.L. Kolbe, and S.R. Eddy, "Infernal 1.0: Inference of RNA Alignments," Bioinformatics, vol. 25, pp. 1335-1337, 2009.
[4] R.J. Klein and S.R. Eddy, "RSEARCH: Finding Homologs of Single Structured RNA Sequences," BMC Bioinformatics, vol. 4, article 44, 2003.
[5] J. Gorodkin, I.L. Hofacker, E. Torarinsson, Z. Yao, J.H. Havgaard, and W.L. Ruzzo, "De Novo Prediction of Structured RNAs from Genomic Sequences," Trends in Biotechnology, vol. 28, pp. 9-19, 2010.
[6] S.H. Bernhart and I.L. Hofacker, "From Consensus Structure Prediction to RNA Gene Finding," Briefings in Functional Genomics Proteomics, vol. 8, no. 6, pp. 461-471, 2009.
[7] E. Rivas and S.R. Eddy, "Noncoding RNA Gene Detection Using Comparative Sequence Analysis," BMC Bioinformatics, vol. 2, no. 1, pp. 8-19, 2001.
[8] S. Washietl, I.L. Hofacker, and P.F. Stadler, "Fast and Reliable Prediction of Noncoding RNAs," Proc Nat'l Academy of Sciences USA, vol. 102, no. 7, pp. 2454-2459, 2005.
[9] J.S. Pedersen, G. Bejerano, A. Siepel, K. Rosenbloom, K. Lindblad-Toh, E.S. Lander, J. Kent, W. Miller, and D. Haussler, "Identification and Classification of Conserved RNA Secondary Structures in the Human Genome," PLoS Computational Biology, vol. 2, no. 4, p. e33, 2006.
[10] J.D. Thompson, D.G. Higgins, and T.J. Gibson, "CLUSTAL W: Improving the Sensitivity of Progressive Multiple Sequence Alignment through Sequence Weighting, Position-Specific Gap Penalties and Weight Matrix Choice," Nucleic Acids Research, vol. 22, no. 22, pp. 4673-4680, 1994.
[11] M. Blanchette, W.J. Kent, C. Riemer, L. Elnitski, A.F. Smit, K.M. Roskin, R. Baertsch, K. Rosenbloom, H. Clawson, E.D. Green, D. Haussler, and W. Miller, "Aligning Multiple Genomic Sequences with the Threaded Blockset Aligner," Genome Research, vol. 14, no. 4, pp. 708-715, 2004.
[12] A.F. Bompfunewerer, C. Flamm, C. Fried, G. Fritzsch, I.L. Hofacker, J. Lehmann, K. Missal, A. Mosig, B. Muller, S.J. Prohaska, B.M. Stadler, P.F. Stadler, A. Tanzer, S. Washietl, and C. Witwer, "Evolutionary Patterns of Non-Coding RNAs," Theory in Biosciences, vol. 123, no. 4, pp. 301-369, 2005.
[13] K.C. Pang, M.C. Fritha, and J.S. Mattick, "Rapid Evolution of Noncoding RNAs: Lack of Conservation does not Mean Lack of Function," Trends in Genetics, vol. 22, no. 1, pp. 1-5, 2005.
[14] O. Aljawad, Y. Sun, A. Liu, and J. Lei, "NcRNA Homology Search Using Hamming Distance Seeds," Proc. ACM Conf. Bioinformatics, Computational Biology and Biomedicine (ACM-BCB '11), 2011.
[15] S.H. Bernhart, I.L. Hofacker, and P.F. Stadler, "Local RNA Base Pairing Probabilities in Large Sequences," Bioinformatics, vol. 22, no. 5, pp. 614-615, 2006.
[16] S. Altschul, W. Gish, W. Miller, E. Myers, and D. Lipman, "Basic Local Alignment Search Tool," J. Molecular Biology, vol. 215, pp. 403-410, 1990.
[17] C. Dieterich, H. Wang, K. Rateitschak, H. Luz, and M. Vingron, "CORG: A Database for COmparative Regulatory Genomics," Nucleic Acids Research, vol. 31, pp. 55-57, 2003.
[18] D. Sankoff, "Simultaneous Solution of the RNA Folding, Alignment and Protosequence Problems," SIAM J. Applied Math., vol. 45, no. 5, pp. 810-825, 1985.
[19] Y. Tabei, K. Tsuda, T. Kin, and K. Asai, "SCARNA: Fast and Accurate Structural Alignment of RNA Sequences by Matching Fixed-Length Stem Fragments," Bioinformatics, vol. 22, no. 14, pp. 1723-1729, 2006.
[20] Y. Tabei, H. Kiryu, T. Kin, and K. Asai, "A Fast Structural Multiple Alignment Method for Long RNA Sequences," BMC Bioinformatics, vol. 9, article 33, 2008.
[21] H. Kiryu, Y. Tabei, T. Kin, and K. Asai, "Murlet: A Practical Multiple Alignment Tool for Structural RNA Sequences," Bioinformatics, vol. 23, no. 13, pp. 1588-1598, 2007.
[22] E. Torarinsson, J.H. Havgaard, and J. Gorodkin, "Multiple Structural Alignment and Clustering of RNA Sequences," Bioinformatics, vol. 23, no. 8, pp. 926-932, 2007.
[23] S. Will, K. Reiche, I.L. Hofacker, P.F. Stadler, and R. Backofen, "Inferring Noncoding RNA Families and Classes by Means of Genome-Scale Structure-Based Clustering," PLoS Computational Biology, vol. 3, no. 4, p. 12, 2007.
[24] C.B. Do, C.-S. Foo, and S. Batzoglou, "A Max-Margin Model for Efficient Simultaneous Alignment and Folding of RNA Sequences," Bioinformatics, vol. 24, no. 13, pp. 168-176, 2008.
[25] R. Achawanantakun, Y. Sun, and S.S. Takyar, "NcRNA Consensus Secondary Structure Derivation Using Grammar Strings," J. Bioinformatics and Computational Biology, vol. 9, no. 2, pp. 317-337, 2011.
[26] A.V. Uzilov, J.M. Keegan, and D.H. Mathews, "Detection of Non-Coding RNAs on the Basis of Predicted Secondary Structure Formation Free Energy Change," BMC Bioinformatics, vol. 7, no. 1,article 173, 2006.
[27] S. Chikkagoudar, D.R. Livesay, and U. Roshan, "PLAST-ncRNA: Partition Function Local Alignment Search Tool for Non-Coding RNA Sequences," Nucleic Acids Research, vol. 38, pp. W59-W63, 2010.
[28] Y. Tabei and K. Asai, "A Local Multiple Alignment Method for Detection of Non-Coding RNA Sequences," Bioinformatics, vol. 25, no. 12, pp. 1498-1505, 2009.
[29] J.H. Havgaard, R.B. Lyngso, G.D. Stormo, and J. Gorodkin, "Pairwise Local Structural Alignment of RNA Sequences with Sequence Similarity Less than 40 percent," Bioinformatics, vol. 21, no. 9, pp. 1815-1824, 2005.
[30] J.H. Havgaard, E. Torarinsson, and J. Gorodkin, "Fast Pairwise Structural RNA Alignments by Pruning of the Dynamical Programming Matrix," PLOS Computational Biology, vol. 3, no. e193, 2007.
[31] J. Sperschneider and A. Datta, "DotKnot: Pseudoknot Prediction Using the Probability Dot Plot under a Refined Energy Model," Nucleic Acids Research, vol. 38, no. 7, p. e103, 2010.
[32] D. Gusfield, Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge Univ. Press, 1997.
[33] M. de Berg, O. Cheong, M. van Kreveld, and M. Overmars, Computational Geometry: Algorithms and Applications, third ed. Springer, 2008.
[34] A. Wilm, I. Mainz, and G. Steger, "An Enhanced RNA Alignment Benchmark for Sequence Alignment Programs," Algorithms for Molecular Biology, vol. 1, p. 19, 2006.
[35] P.P. Gardner, A. Wilm, and S. Washietl, "A Benchmark of Multiple Sequence Alignment Programs Upon Structural RNAs," Nucleic Acids Research, vol. 33, no. 8, pp. 2433-2439, 2005.
[36] The embl website, http://www.ebi.ac.ukembl/, 2012.
[37] D.R. Yoder-Himes, P.S.G. Chain, Y. Zhu, O. Wurtzel, E.M. Rubin, J.M. Tiedje, and R. Sorek, "Mapping the Burkholderia Cenocepacia Niche Response Via High-Throughput Sequencing," Proc Nat'l Academy of Sciences USA, vol. 106, no. 10, pp. 3976-3981, 2009.
[38] The Integrated Microbial Genomes Website, http://img.jgi.doe. gov/cgi-bin/pubmain.cgi , 2012.
[39] S. Griffiths-Jones, A. Bateman, M. Marshall, A. Khanna, and S.R. Eddy, "Rfam: An RNA Family Database," Nucleic Acids Research, vol. 31, pp. 439-441, 2003.
[40] B. Langmead, C. Trapnell, M. Pop, and S.L. Salzberg, "Ultrafast and Memory-Efficient Alignment of Short DNA Sequences to the Human Genome," Genome Biology, vol. 10, p. R25, 2009.
[41] D. Langenberger, C. Bermudez-Santana, J. Hertel, S. Hoffmann, P. Khaitovich, and P.F. Stadler, "Evidence for Human microRNA-Offset RNAs in Small RNA Sequencing Data," Bioinformatics, vol. 25, no. 18, pp. 2298-2301, 2009.
[42] I. Hofacker, W. Fontana, P. Stadler, S. Bonhoeffer, M. Tacker, and P. Schuster, "Fast Folding and Comparison of RNA Secondary Structures," Monatshefte f. Chemie, vol. 125, pp. 167-188, 1994.
[43] M. Zuker and P. Stiegler, "Optimal Computer Folding of Large RNA Sequences Using Thermodynamic and Auxiliary Information," Nucleic Acid Research, vol. 9, pp. 133-148, 1981.
[44] J. McCaskill, "The Equilibrium Partition Function and Base Pair Binding Probabilities for RNA Secondary Structures," Biopolymers, vol. 29, pp. 1105-1119, 1990.
[45] I.L. Hofacker and P.F. Stadler, "Memory Efficient Folding Algorithms for Circular RNA Secondary Structures," Proc. German Conf. Bioinformatics (GCB '03), pp. 15-25, 2006.
[46] The NCBI Microbial Genomic BLAST Website, http://www.ncbi.nlm.nih.gov/sutilsgenom_table.cgi?organism= microb , 2012.
91 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool