The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.03 - May-June (2012 vol.9)
pp: 774-787
Cheng Yuan , Dept. of Comput. Sci. & Eng., Michigan State Univ., East Lansing, MI, USA
ABSTRACT
Detecting members of known noncoding RNA (ncRNA) families in genomic DNA is an important part of sequence annotation. However, the most widely used tool for modeling ncRNA families, the covariance model (CM), incurs a high-computational cost when used for genome-wide search. This cost can be reduced by using a filter to exclude sequences that are unlikely to contain the ncRNA of interest, applying the CM only where it is likely to match strongly. Despite recent advances, designing an efficient filter that can detect ncRNA instances lacking strong conservation while excluding most irrelevant sequences remains challenging. In this work, we design three types of filters based on multiple secondary structure profiles (SSPs). An SSP augments a regular profile (i.e., a position weight matrix) with secondary structure information but can still be efficiently scanned against long sequences. Multi-SSP-based filters combine evidence from multiple SSP matches and can achieve high sensitivity and specificity. Our SSP-based filters are extensively tested in BRAliBase III data set, Rfam 9.0, and a published soil metagenomic data set. In addition, we compare the SSP-based filters with several other ncRNA search tools including Infernal (with profile HMMs as filters), ERPIN, and tRNAscan-SE. Our experiments demonstrate that carefully designed SSP filters can achieve significant speedup over unfiltered CM search while maintaining high sensitivity for various ncRNA families. The designed filters and filter-scanning programs are available at our website: www.cse.msu.edu/~yannisun/ssp/.
INDEX TERMS
Web sites, biology computing, covariance analysis, DNA, filters, genomics, molecular biophysics, physiological models, RNA, website, designing filters, fast-known RNA identification, noncoding RNA families, genomic DNA, sequence annotation, covariance model, genome-wide search, multiple secondary structure profiles, secondary structure information, SSP-based filters, soil metagenomic data set, Sensitivity, Bioinformatics, RNA, Hidden Markov models, Dynamic programming, Algorithm design and analysis, Heuristic algorithms, formal languages., Algorithms for data and knowledge, bioinformatics (genome or protein), feature extraction or construction
CITATION
Cheng Yuan, "Designing Filters for Fast-Known NcRNA Identification", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.9, no. 3, pp. 774-787, May-June 2012, doi:10.1109/TCBB.2011.149
38 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool