This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Performance and Architectural Issues for String Matching
February 1990 (vol. 39 no. 2)
pp. 238-250

The authors introduce special heuristics to the Knuth-Morris-Pratt algorithm to reduce the time and space required to perform the string matching. They compare their hardware-based approach to the software approaches embodied in the Unix system grep and fgrep commands. Simulation results show that the hardware approach can provide a 25-500-fold performance improvement, depending on the complexity of the query, and that it is fast enough, even in the presence of variable-length 'don't cares' to keep up with a 20-million character/second disk. The approach compares favorably to other hardware designs in speed and space. The proposed hardware implementation requires 10 kB of one cycle static memory, 28 single-character comparators, four 16-b adders, and control logic for four finite-state machines with a term-matcher controller. After that, additional hardware produces negligible performance improvements for queries with up to 80 terms, about half of which have variable-length 'don't cares'.

[1] D. Hsiao,Advanced Database Machine Architecture. Englewood Cliffs, NJ: Prentice-Hall, 1983.
[2] D. Tsichritzis, Ed.,Office Automation. New York: Springer-Verlag, 1985, pp. 315-338.
[3] T. A. Welch,IEEE Comput. Mag., vol. 16, no. 6, pp. 8-19, June 1984.
[4] D.C. Blair and M.E. Marron, "An Evaluation of Retrieval Effectiveness for a Full-Text Document-Retrieval System,"Comm. ACM, Vol. 28, No. 3, Mar. 1985, pp. 289-299.
[5] R.S. Boyer and J. Moore, "A Fast String Searching Algorithm,"Comm. ACM, Vol. 20, Oct. 1977, pp. 762-772.
[6] C. Faloutsos and S. Christodoulakis, "Signature files: An access method for documents and its analytical performance evaluation,"ACM Trans. Office Inform. Syst., vol. 2, Oct. 1984.
[7] M. C. Harrison, "Implementation of the substring test by hashing,"Commun. ACM, vol. 14, no. 12, pp. 777-779, Dec. 1971.
[8] S. R. Ahuja and C. S. Roberts, "An associative/parallel processor for partial match retrieval using superimposed codes," inProc. Seventh Annu. Symp. Comput. Architecture, May 1980, pp. 218-227.
[9] G. H. Gonnet, "Unstructured databases or very efficient text searching,"SIGMOD, pp. 117-124, 1983.
[10] L. A. Hollaar, "Text retrieval computers,"IEEE Comput. Mag., vol. 12, no. 3, pp. 40-50, 1979.
[11] C. Stanfill and B. Kahle, "Parallel free-text search on the connection machine system,"Commun. ACM, vol. 29, pp. 1229-1239, Dec. 1986.
[12] D. E. Knuth, J. H. Morris, and V. R. Pratt, "Fast pattern matching in strings,"Siam J. Comput., vol. 6, no. 6, pp. 323-350.
[13] R. M. Bird, J. C. Tu, and R. M. Worthy, "Associative/parallel processors for searching very large textual data bases," inProc. Third ACM Workshop Comput. Architecture Nonnumeric Processing, May 1977, pp. 8-16.
[14] R.L. Haskin and L.A. Hollaar, "Operational Characteristics of a Hardware-Based Pattern Matcher,"ACM Trans. Database Systems, Vol. 8, Mar. 1983, pp. 15-40.
[15] R. L. Haskin, "Hardware for searching very large text databases," Ph.D. dissertation, Univ. of Illinois, Aug. 1980.
[16] A. V. Aho and J. D. Ullman,Principles of Compiler Design. Reading, MA: Addison-Wesley, pp. 91-94.
[17] B. Stroustrup,The C++ Programming Language. Reading MA: Addison-Wesley, 1987.
[18] L. Hollaar, "The Utah Text Search Engine: Implementation experiences and future plans," inProc. Fourth Int. Workshop Database Machines, Mar. 1985.
[19] K. Su, S. Hsu, and P. Otsubo, "The fast data finder--An architecture for very high speed data search and dissemination," inProc. IEEE Int. Conf. Data Eng., Los Angeles, CA, Apr. 1984, pp. 167-174.
[20] Mayper, Michels, and Nagy, "A practical text search system for unindexed data," inProc. COMPSAC, IEEE Comput. Software Appl. Conf., Chicago, IL, Nov. 13-16, 1978.
[21] D. A. Morris, "Processor matches text at high speeds,"Mini-micro Syst., pp. 227-235, June 1983.
[22] B. W. Kemighan and M. D. Ritchie,The C Programmig Language, 2nd ed. Englewood Cliffs, NJ: Prentice-Hall, 1988.
[23] F. J. Burkowski, "A hardware hashing scheme in the design of a multiterm string comparator,"IEEE Trans. Comput., vol. C-31, no. 9, pp. 825-834, Sept. 1982.
[24] C. Faloutsos, "Access methods for text,"ACM Comput. Surveys, vol. 17, pp. 49-74, Mar. 1985.

Index Terms:
performance issues; architectural issues; string matching; Knuth-Morris-Pratt algorithm; hardware-based approach; software approaches; Unix system; hardware implementation; control logic; finite-state machines; term-matcher controller; database management systems; finite automata; information retrieval.
Citation:
M.E. Isenman, D.E. Shasha, "Performance and Architectural Issues for String Matching," IEEE Transactions on Computers, vol. 39, no. 2, pp. 238-250, Feb. 1990, doi:10.1109/12.45209
Usage of this product signifies your acceptance of the Terms of Use.