loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Finding Significant Matches of Position Weight Matrices in Linear Time
PrePrint
ISSN: 1545-5963
Cinzia Pizzi, University of Padova, Padova
Pasi Rastas, University of Helsinki, Helsinki
Esko Ukkonen, University of Helsinki, Helsinki
Position weight matrices are an important method for modeling signals or motifs in biological sequences, both in DNA and protein contexts. In this paper we present fast algorithms for the problem of finding significant matches of such matrices. Our algorithms are of the on--line type, and they generalize classical multi-pattern matching, filtering, and super-alphabet techniques of combinatorial string matching to the problem of weight matrix matching. Several variants of the algorithms are developed, including multiple matrix extensions that perform the search for several matrices in one scan through the sequence database. Experimental performance evaluation is provided to compare the new techniques against each other as well as against some other on--line and index--based algorithms proposed in the literature. Compared to the brute-force $O(mn)$ approach, our solutions can be faster by a factor that is proportional to the matrix length $m$. Our multiple-matrix filtration algorithm had the best performance in the experiments. On a current PC, this algorithm finds significant matches ($p$ = 0.0001) of the 123 JASPAR matrices in the human genome in about 18 minutes.
Index Terms:
Pattern matching, Nonnumerical Algorithms and Problems, Analysis of Algorithms and Problem Complexity, Bioinformatics (genome or protein) databases, Database Applications, Database Management, Information Tec
Citation:
Cinzia Pizzi, Pasi Rastas, Esko Ukkonen, "Finding Significant Matches of Position Weight Matrices in Linear Time," IEEE/ACM Transactions on Computational Biology and Bioinformatics, 12 Mar. 2009. IEEE computer Society Digital Library. IEEE Computer Society, <http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.35>
Usage of this product signifies your acceptance of the Terms of Use.