loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
2004 IEEE Computational Systems Bioinformatics Conference (CSB'04)
Compressed Pattern Matching in DNA Sequences
Stanford, California
August 16-August 19
ISBN: 0-7695-2194-0
Lei Chen, Wayne State University
Shiyong Lu, Wayne State University
Jeffrey Ram, Wayne State University
We propose derivative Boyer-Moore (d-BM), a new compressed pattern matching algorithm in DNA sequences. This algorithm is based on the Boyer-Moore method, which is one of the most popular string matching algorithms. In this approach, we compress both DNA sequences and patterns by using two bits to represent each A, T, C, G character. Experiments indicate that this compressed pattern matching algorithm searches long DNA patterns (length > 50) more than 10 times faster than the exact match routine of the software package Agrep, which is known as the fastest pattern matching tool. Moreover, compression of DNA sequences by this method gives a guaranteed space saving of 75%. In part the enhanced speed of the algorithm is due to the increased efficiency of the Boyer-Moore method resulting from an increase in alphabet size from 4 to 256.
Citation:
Lei Chen, Shiyong Lu, Jeffrey Ram, "Compressed Pattern Matching in DNA Sequences," csb, pp.62-68, 2004 IEEE Computational Systems Bioinformatics Conference (CSB'04), 2004
Usage of this product signifies your acceptance of the Terms of Use.