|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
17th International Conference on Database and Expert Systems Applications (DEXA'06)
Invited Paper: Pattern Matching Using n-Gram Sampling of Cumulative Algebraic Signatures: Preliminary Results
Krakow, Poland
September 04-September 08
ISBN: 0-7695-2641-1
| ASCII Text | x | ||
| Witold Litwin, Riad Mokadem, Philippe Rigaux, Schwarz Thomas, "Invited Paper: Pattern Matching Using n-Gram Sampling of Cumulative Algebraic Signatures: Preliminary Results," 2012 23rd International Workshop on Database and Expert Systems Applications, pp. 251-255, 17th International Conference on Database and Expert Systems Applications (DEXA'06), 2006. | |||
| BibTex | x | ||
| @article{ 10.1109/DEXA.2006.80, author = {Witold Litwin and Riad Mokadem and Philippe Rigaux and Schwarz Thomas}, title = {Invited Paper: Pattern Matching Using n-Gram Sampling of Cumulative Algebraic Signatures: Preliminary Results}, journal ={2012 23rd International Workshop on Database and Expert Systems Applications}, volume = {0}, year = {2006}, issn = {1529-4188}, pages = {251-255}, doi = {http://doi.ieeecomputersociety.org/10.1109/DEXA.2006.80}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - CONF JO - 2012 23rd International Workshop on Database and Expert Systems Applications TI - Invited Paper: Pattern Matching Using n-Gram Sampling of Cumulative Algebraic Signatures: Preliminary Results SN - 1529-4188 SP251 EP255 A1 - Witold Litwin, A1 - Riad Mokadem, A1 - Philippe Rigaux, A1 - Schwarz Thomas, PY - 2006 KW - SDDS KW - grid KW - structured P2P KW - scalable distributed pattern matching KW - algebraic signatures. VL - 0 JA - 2012 23rd International Workshop on Database and Expert Systems Applications ER - | |||
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/DEXA.2006.80
We propose a novel string (pattern) matching algorithm called n-gram search. We intend it for the records stored once and searched many times in a database or a file, especially those organized in a Scalable Distributed Data Structure, (SDDS), over a grid or a structured P2P net. We presume that the records are encoded into their cumulative algebraic signatures, providing incidental confidentiality of stored data. The search starts with pre-processing the pattern, calculating the logarithmic algebraic signature (LAS) of the pattern and the LASs of every n-gram in it. The value of n \geqslant 1 is a parameter that one may tune. The search attempts to match the LASs of n-grams in the pattern towards dynamically calculated LASs, sampled over n-grams in the records. A mismatch generates a shift of up to K-n symbols towards next sample, where K is the pattern length. The whole process is parallel over the SDDS servers and does not require any local decoding. For an M-symbol long record, the unsuccessful search, measured as number of match attempts, costs O ((M-K) / (Kn+ 1)). The 2-grams should typically suffice, leading to O ((M-K) / (K-1)). We show that the algorithm particularly efficient for larger strings and records, i.e., with e-documents or DNA data. Preliminary results show then the n-gram search about (K - n + 1) faster than our previous algorithms and among the fastest known, e.g., probably often faster than Boyer-Moore. % MathType!MTEF!2!1!+- % feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn % hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr % 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9 % vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x % fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeyyzImlaaa!37B3! \[ \geqslant \]
Index Terms:
SDDS, grid, structured P2P, scalable distributed pattern matching, algebraic signatures.
Citation:
Witold Litwin, Riad Mokadem, Philippe Rigaux, Schwarz Thomas, "Invited Paper: Pattern Matching Using n-Gram Sampling of Cumulative Algebraic Signatures: Preliminary Results," dexa, pp.251-255, 17th International Conference on Database and Expert Systems Applications (DEXA'06), 2006
Usage of this product signifies your acceptance of the Terms of Use.
