This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
17th International Conference on Database and Expert Systems Applications (DEXA'06)
Invited Paper: Pattern Matching Using n-Gram Sampling of Cumulative Algebraic Signatures: Preliminary Results
Krakow, Poland
September 04-September 08
ISBN: 0-7695-2641-1
Witold Litwin, Dauphine University, France
Riad Mokadem, Dauphine University, France
Philippe Rigaux, Dauphine University, France
Schwarz Thomas, Santa Clara University, California
We propose a novel string (pattern) matching algorithm called n-gram search. We intend it for the records stored once and searched many times in a database or a file, especially those organized in a Scalable Distributed Data Structure, (SDDS), over a grid or a structured P2P net. We presume that the records are encoded into their cumulative algebraic signatures, providing incidental confidentiality of stored data. The search starts with pre-processing the pattern, calculating the logarithmic algebraic signature (LAS) of the pattern and the LASs of every n-gram in it. The value of n \geqslant 1 is a parameter that one may tune. The search attempts to match the LASs of n-grams in the pattern towards dynamically calculated LASs, sampled over n-grams in the records. A mismatch generates a shift of up to K-n symbols towards next sample, where K is the pattern length. The whole process is parallel over the SDDS servers and does not require any local decoding. For an M-symbol long record, the unsuccessful search, measured as number of match attempts, costs O ((M-K) / (Kn+ 1)). The 2-grams should typically suffice, leading to O ((M-K) / (K-1)). We show that the algorithm particularly efficient for larger strings and records, i.e., with e-documents or DNA data. Preliminary results show then the n-gram search about (K - n + 1) faster than our previous algorithms and among the fastest known, e.g., probably often faster than Boyer-Moore. % MathType!MTEF!2!1!+- % feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn % hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr % 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9 % vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x % fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeyyzImlaaa!37B3! \[ \geqslant \]
Index Terms:
SDDS, grid, structured P2P, scalable distributed pattern matching, algebraic signatures.
Citation:
Witold Litwin, Riad Mokadem, Philippe Rigaux, Schwarz Thomas, "Invited Paper: Pattern Matching Using n-Gram Sampling of Cumulative Algebraic Signatures: Preliminary Results," dexa, pp.251-255, 17th International Conference on Database and Expert Systems Applications (DEXA'06), 2006
Usage of this product signifies your acceptance of the Terms of Use.