This Article 
 Bibliographic References 
 Add to: 
Searching Genomes for Noncoding RNA Using FastR
October-December 2005 (vol. 2 no. 4)
pp. 366-379

Abstract—The discovery of novel noncoding RNAs has been among the most exciting recent developments in biology. It has been hypothesized that there is, in fact, an abundance of functional noncoding RNAs (ncRNAs) with various catalytic and regulatory functions. However, the inherent signal for ncRNA is weaker than the signal for protein coding genes, making these harder to identify. We consider the following problem: Given an RNA sequence with a known secondary structure, efficiently detect all structural homologs in a genomic database by computing the sequence and structure similarity to the query. Our approach, based on structural filters that eliminate a large portion of the database while retaining the true homologs, allows us to search a typical bacterial genome in minutes on a standard PC. The results are two orders of magnitude better than the currently available software for the problem. We applied FastR to the discovery of novel riboswitches, which are a class of RNA domains found in the untranslated regions. They are of interest because they regulate metabolite synthesis by directly binding metabolites. We searched all available eubacterial and archaeal genomes for riboswitches from purine, lysine, thiamin, and riboflavin subfamilies. Our results point to a number of novel candidates for each of these subfamilies and include genomes that were not known to contain riboswitches.

[1] S.F. Altschul, W. Gish, W. Miller, E.W. Myers, and D.J. Lipman, “Basic Local Alignment Search Tool,” J. Molecular Biology, vol. 215, pp. 403-410, 1990.
[2] L. Argaman et al., “Novel Small RNA-Encoding Genes in the Ontergenic Regions of Escherischia Coli,” Current Biology, vol. 11, pp. 941-950, 2001.
[3] V. Bafna, S. Muthukrishnan, and R. Ravi, “Computing Similarity between RNA Strings,” Combinatorial Pattern Matching Conf., vol. 937, pp. 1-14, 1995.
[4] J.-H. Chen, S.-Y. Lee, and B. Shapiro, “A Computational Procedure for Assessing the Significance of RNA Secondary Structure,” Computer Applications in the Biosciences, vol. 6, pp. 7-18, 1990.
[5] A. Coventry, D.J. Kleitman, and B. Berger, “MSARI: Multiple Sequence Alignments for Statistical Detection of RNA Secondary Structure,” Proc. Nat'l Academy of Sciences, vol. 101, no. 33, pp. 12102-12107, 2004.
[6] D. di Bernardo, T. Down, and T. Hubbard, “ddbRNA: Detection of Conserved Secondary Structures in Multiple Alignments,” Bioinformatics, vol. 19, no. 13, pp. 1606-1611, 2003.
[7] M. Dsouza, N. Larsen, and R. Overbeek, “Searching for Patterns in Genomic Data,” Trends in Genetics, vol. 13, no. 12, pp. 497-498, 1997.
[8] R. Durbin, S. Eddy, A. Krogh, and G. Mitchison, “Covariance Models: SCFG-Based RNA Profiles,” Biological Sequence Analysis, chapter 10.3, Cambridge Univ. Press, 1998.
[9] S.R. Eddy, “Non-Coding RNA Genes and the Modern RNA World,” Nature Rev. in Genetics, vol. 2, pp. 919-929, 2001.
[10] S.R. Eddy and R. Durbin, “RNA Sequence Analysis Using Covariance Models,” Nucleic Acids Research, vol. 22, pp. 2079-2088, 1994.
[11] D. Gautheret and A. Lambert, “Direct RNA Motif Definition and Identification from Multiple Sequence Alignments Using Secondary Structure Profiles,” J. Molecular Biology, vol. 313, no. 5, pp. 1003-1011, 2001.
[12] S. Griffiths-Jones, A. Bateman, M. Marshall, A. Khanna, and S.R. Eddy, “Rfam: An RNA Family Database,” Nucleic Acids Research, vol. 31, no. 1, pp. 439-441, 2003.
[13] D. Gusfield, Algorithms on Strings, Trees, and Sequences. Cambridge Univ. Press, 1997.
[14] M. Höchsmann, T. Töller, R. Giegerich, and S. Kurtz, “Local Similarity in RNA Secondary Structures,” Proc. Second IEEE CS Bioinformatics Conf. (CSB 2003), pp. 159-168, 2003.
[15] I.L. Hofacker, B. Priwitzer, and P.F. Stadler, “Prediction of Locally Stable RNA Secondary Structures for Genome-Wide Surveys,” Bioinformatics, vol. 20, no. 2, pp. 186-190, 2004.
[16] F. Jacob and J. Monod, “Genetic Regulatory Mechanisms in the Synthesis of Proteins,” J. Molecular Biology, vol. 3, pp. 318-356, 1961.
[17] J. Jaeger, D.H. Turner, and M. Zuker, “Improved Prediction of Secondary Structures for RNA,” Proc. Nat'l Academy of Sciences, vol. 86, pp. 7706-7710, 1989.
[18] T. Jiang, G. Lin, B. Ma, and K. Zhang, “A General Edit Distance between RNA Structures,” J. Computational Biology, vol. 9, pp. 371-388, 2002.
[19] R.J. Klein and S.R. Eddy, “Rsearch: Finding Homologs of Single Structured RNA Sequences,” BMC Bioinformatics, vol. 4, no. 1, p. 44, 2003.
[20] A. Lambert et al., “The ERPIN Server: An Interface to Profile-Based RNA Motif Identification,” Nucleic Acids Research, vol. 32, no. s2, pp. W160-165, 2004.
[21] E. Lander et al., “Initial Sequencing and Analysis of the Human Genome,” Nature, vol. 409, pp. 860-921, 2001.
[22] S.Y. Le, J.H. Chen, and J. Maizel, Structure and Methods: Human Genome Initiative and DNA Recombination, vol. 1, pp. 127-136. Adenine Press, 1990.
[23] R.C. Lee and V. Ambros, “An Extensive Class of Small RNAs in Caenorhabditis elegans,” Science, vol. 294, pp. 862-864, 2001.
[24] H.P. Lenhof, K. Reinert, and M. Vingron, “A Polyhedral Approach to RNA Sequence Structure Alignment,” J. Computational Biology, vol. 5, no. 3, pp. 517-530, 1998.
[25] L.P. Lim, N.C. Lau, E.G. Weinstein, A. Abdelhakim, S. Yekta, M.W. Rhoades, C.B. Burge, and D.P. Bartel, “The MicroRNAs of Caenorhabditis elegans,” Genes and Developtment, vol. 17, pp. 991-1008, 2003.
[26] T.R. Lowe and S.R. Eddy, “tRNAscan-SE: A Program for Improved Detection of Transfer RNA Genes in Genomic Sequence,” Nucleic Acids Research, vol. 25, pp. 955-964, 1997.
[27] D.H. Mathews and D.H. Turner, “Dynalign: An Algorithm for Finding the Secondary Structure Common to Two RNA Sequences,” J. Molecular Biology, vol. 317, no. 2, pp. 191-203, 2002.
[28] J.P. McCutcheon and S.R. Eddy, “Computational Identification of Non-Coding RNAs in Saccharomyces cerevisiae by Comparative Genomics,” Nucleic Acids Research, vol. 31, no. 14, pp. 4119-4128, 2003.
[29] A. Nahvi, N. Sudarshan, M.S. Ebert, X. Zou, K.L. Brown, and R.R. Breaker, “Genetic Control by a Metabolite Binding mRNA,” Chemical Biology, vol. 9, pp. 1043-1049, 2003.
[30] C.D. Novina and P.A. Sharp, “The RNAi Revolution,” Nature, vol. 430, no. 6996, pp. 161-164, 2004.
[31] E. Rivas and S.R. Eddy, “Secondary Structure Alone Is Generally Not Statistically Significant for the Detection of Noncoding RNAs,” Bioinformatics, vol. 16, no. 7, pp. 583-605, 2000.
[32] E. Rivas and S.R. Eddy, “Noncoding RNA Gene Detection Using Comparative Sequence Analysis,” BMC Bioinformatics, vol. 2, pp. 8-26, 2001.
[33] E. Rivas, R.J. Klein, T.A. Jones, and S.R. Eddy, “Computational Identification of Noncoding RNAs in E. coli by Comparative Genomics,” Current Biology, vol. 11, pp. 1369-1373, 2001.
[34] D.A. Rodinov, A.G. Vitreschak, A.A. Mironov, and M.S. Gelfand, “Regulation of Lysine Biosynthesis and Transport Genes in Bacteria: Yet Another RNA Riboswitch?” Nucleic Acids Research, vol. 31, no. 23, pp. 6748-6757, 2003.
[35] Y. Sakakibara, M. Brown, R. Hughey, I.S. Mian, K. Sjölander, R.C. Underwood, and D. Haussler, “Recent Methods for RNA Modeling Using Stochastic Context Free Grammars,” Proc. Combinatorial Pattern Matching Conf., vol. 807, 1994.
[36] D. Sankoff, “Simulations Solution of the RNA Folding, Alignment and Protosequence Problems,” SIAM J. Applied Math., vol. 45, no. 5, pp. 810-825, 1985.
[37] M. Szymanski, M.Z. Barciszewska, V.A. Erdmann, and J. Barciszewski, “5S Ribosomal RNA Database,” Nucleic Acids Research, vol. 28, no. 1, pp. 166-167, 2002.
[38] J.C. Venter et al. “The Sequence of the Human Genome,” Science, vol. 291, no. 5507, pp. 1304-1351, 2001.
[39] A.G. Vitreschak et al. “Riboswitches: The Oldest Mechanism for the Regulation of Gene Expression?” Trends in Genetics, vol. 20, no. 1, pp. 44-50, 2003.
[40] S. Washietl and I.L. Hofacker, “Consensus Folding of Aligned Sequences as a New Measure for the Detection of Functional RNAs by Comparative Genomics,” J. Molecular Biology, vol. 342, no. 1, pp. 19-30, 2004.
[41] R.H. Waterson et al. “Initial Sequencing and Comparative Analysis of the Mouse Genome,” Nature, vol. 420, no. 6915, pp. 520-562, 2002.
[42] Z. Weinberg and W.L. Ruzzo, “Faster Genome Annotation of Non-Coding RNA Families Without Loss of Accuracy,” Proc. Int'l Conf. Research in Computational Molecular Biology, pp. 243-251, ACM Press, 2004.
[43] W.C. Winkler and R.R. Breaker, “Genetic Control by Metabolite-Binding Riboswitches,” Chembiochem, vol. 4, no. 10, pp. 1024-1032, 2003.
[44] C. Workman and A. Krogh, “No Evidence that mRNA have Lower Folding Free Energy than Random Sequences with the same Dinucleotide Distribution,” Nucleic Acids Research, vol. 27, no. 24, pp. 4816-4822, 1999.
[45] K. Zhang, L. Wang, and B. Ma, “Computing Similarity between RNA Structures,” Combinatorial Pattern Matching, pp. 281-293, 1999.
[46] M. Zuker and D. Sankoff, “RNA Secondary Structures and their Prediction,” Bull. Math. Biology, vol. 46, pp. 591-621, 1984.

Index Terms:
Noncoding RNA, database search, filtration, riboswitch, bacterial genome.
Shaojie Zhang, Brian Haas, Eleazar Eskin, Vineet Bafna, "Searching Genomes for Noncoding RNA Using FastR," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 2, no. 4, pp. 366-379, Oct.-Dec. 2005, doi:10.1109/TCBB.2005.57
Usage of this product signifies your acceptance of the Terms of Use.