This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
An Efficient Algorithm for Matching Multiple Patterns
April 1993 (vol. 5 no. 2)
pp. 339-351

An efficient algorithm for performing multiple pattern match in a string is described. The match algorithm combines the concept of deterministic finite state automata (DFSA) and the Boyer-Moore algorithm to achieve better performance. Experimental results indicate that in the average case, the algorithm is able to perform pattern match operations sublinearly, i.e. it does not need to inspect every character of the string to perform pattern match operations. The analysis shows that the number of characters to be inspected decreases as the length of patterns increases, and increases slightly as the total number of patterns increases. To match an eight-character pattern in an English string using the algorithm, only about 17% of all characters of the strong and 33% of all characters of the string, when the number of patterns is seven, are inspected. In an actual testing, the algorithm running on SUN 3/160 takes only 3.7 s to search seven eight-character patterns in a 1.4-Mbyte English text file.

[1] D. E. Knuth, J. H. Morris, and V. R. Pratt, "Fast pattern in strings,"SIAM J. Comput., vol. 6, pp. 323-350, June 1977.
[2] R.S. Boyer and J. Moore, "A Fast String Searching Algorithm,"Comm. ACM, Vol. 20, Oct. 1977, pp. 762-772.
[3] R. Sedgewick,Algorithms. Reading, MA, Addison-Wesley, 1983.
[4] L. J. Guibas and A. M. Odlyzko, "A new proof of the linearity of the Boyer-Moore string searching algorithm,"Found. Comput. Sci., pp. 189-195, 1977.
[5] A. V. Aho and M. J. Corasick, "Efficient string matching: An aid to bibliographic search,"Commun. ACM, pp. 333-340, June 1975.
[6] F. Bancilhon, F. S. Gamerman, J. M. Laubin, L. P. Richard, M. Scholl, D. Tusera, and A. Verroust, "Verson: A relational database machine," inAdvance Database Machine Architecture, D. K. Hsiao, ed. Englewood Cliffs, NJ: Prentice-Hall, 1983.
[7] R.L. Haskin and L.A. Hollaar, "Operational Characteristics of a Hardware-Based Pattern Matcher,"ACM Trans. Database Systems, Vol. 8, Mar. 1983, pp. 15-40.
[8] R. Gonzalez-Rubio, J. Rohmer, and D. Terral, "The Schuss filter: A processor for non-numerical data processing," inProc. 11th Ann. Int. Symp. on Computer Architecture, June 5-7, 1984, pp. 64-73.
[9] S. Pramanik, " Database filters," inProc. 9th Ann. Symp. Computer Architecture, Apr. 26-29, 1982, pp. 201-210.
[10] K. Takahashiet al., "A new string search hardware architecture for VLSI," inProc. 13th Annu. Int. Symp. Comput. Architecture, June 1986, pp. 20-27.
[11] S. L. Ou and K. Y. Su, "A new pattern matcher used in the data filter of mrdbm," inProc. IEEE TENCON, 1987, pp. 307-312.
[12] A. V. Aho, R. Sethi, and J. D. Ullman,Compilers: Principles, Techniques, and Tools. Reading, MA: Addison-Wesley, 1986.
[13] P. J. Bickel and K. A. Doksum,Mathematical Statistics: Basic Ideas and Selected Topics. San Francisco, CA: Holden-Day, 1977.
[14] C. L. Liu,Introduction to Combinatorial Mathematics. Englewood Cliffs, NJ: Prentice-Hall, 1977.

Index Terms:
multiple pattern match; deterministic finite state automata; Boyer-Moore algorithm; English string; SUN 3/160; eight-character patterns; English text file; finite automata; pattern recognition; word processing
Citation:
J.-J. Fan, K.-Y. Su, "An Efficient Algorithm for Matching Multiple Patterns," IEEE Transactions on Knowledge and Data Engineering, vol. 5, no. 2, pp. 339-351, April 1993, doi:10.1109/69.219740
Usage of this product signifies your acceptance of the Terms of Use.