DOI Bookmark:
http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.35
Position weight matrices are an important method for modeling signals or motifs in biological sequences, both in DNA and protein contexts. In this paper we present fast algorithms for the problem of finding significant matches of such matrices. Our algorithms are of the on--line type, and they generalize classical multi-pattern matching, filtering, and super-alphabet techniques of combinatorial string matching to the problem of weight matrix matching. Several variants of the algorithms are developed, including multiple matrix extensions that perform the search for several matrices in one scan through the sequence database. Experimental performance evaluation is provided to compare the new techniques against each other as well as against some other on--line and index--based algorithms proposed in the literature. Compared to the brute-force $O(mn)$ approach, our solutions can be faster by a factor that is proportional to the matrix length $m$. Our multiple-matrix filtration algorithm had the best performance in the experiments. On a current PC, this algorithm finds significant matches ($p$ = 0.0001) of the 123 JASPAR matrices in the human genome in about 18 minutes.
Index Terms:
Pattern matching, Nonnumerical Algorithms and Problems, Analysis of Algorithms and Problem Complexity, Bioinformatics (genome or protein) databases, Database Applications, Database Management, Information Tec
Citation:
Cinzia Pizzi, Pasi Rastas, Esko Ukkonen, "Finding Significant Matches of Position Weight Matrices in Linear Time," IEEE/ACM Transactions on Computational Biology and Bioinformatics, 12 Mar. 2009. IEEE computer Society Digital Library. IEEE Computer Society, <http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.35>
Usage of this product signifies your acceptance of the
Terms of Use.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||