Subscribe
Issue No.11 - November (2011 vol.60)
pp: 1596-1609
Derek Pao , City University of Hong Kong, Hong Kong
Xing Wang , Peking University, Shenzhen
Xiaoran Wang , City University of Hong Kong, Hong Kong
Cong Cao , City University of Hong Kong, Hong Kong
Yuesheng Zhu , Peking University, Shenzhen
ABSTRACT
A memory-efficient hardware string searching engine for antivirus applications is presented. The proposed QSV method is based on quick sampling of the input stream against fixed-length pattern prefixes, and on-demand verification of variable-length pattern suffixes. Patterns handled by the QSV method are required to have at least 16 bytes, and possess distinct 16-byte prefixes. The latter requirement can be fulfilled by a preprocessing procedure. The search engine uses the pipelined Aho-Corasick (P-AC) architecture developed by the first author to process 4 to 15-byte short patterns and a small number of exception cases. Our design was evaluated using the ClamAV virus database having 82,888 strings with a total size that exceeds 8 Mbyte. In terms of byte count, 99.3 percent of the pattern set is handled by the QSV method and 0.7 percent of the pattern set is handled by P-AC. A pattern with distinct 16-byte prefix only occupies up to three lookup table entries in QSV. The overall memory cost of our system is about 1.4 Mbyte, i.e., 1.4 bit per character of the ClamAV pattern set. The proposed method is memory-based, hence, updates to the pattern set can be accommodated by modifying the contents of the lookup tables without reconfiguring the hardware circuits.
INDEX TERMS
String searching, antivirus system, system security, embedded system.
CITATION
Derek Pao, Xing Wang, Xiaoran Wang, Cong Cao, Yuesheng Zhu, "String Searching Engine for Virus Scanning", IEEE Transactions on Computers, vol.60, no. 11, pp. 1596-1609, November 2011, doi:10.1109/TC.2010.250
REFERENCES
 [1] A.V. Aho and M.J. Corasick, “Efficient String Matching: An Aid to Bibliographic Search,” Comm. ACM, vol. 18, no. 6, pp. 333-340, 1975. [2] M. Alicherry, M. Muthuprasanna, and V. Kumar, “High Speed Matching for Network IDS/IPS,” Proc. IEEE Int'l Conf. Network Protocols, pp. 187-196, 2006. [3] Z.K. Baker and V.K. Prasanna, “A Computationally Efficient Engine for Flexible Intrusion Detection,” IEEE Trans. Very Large Scale Integration (VLSI) Systems, vol. 13, no. 10, pp. 1179-1189, Oct. 2005. [4] Z.K. Baker and V.K. Prasanna, “Automatic Synthesis of Efficient Intrusion Detection Systems on FPGAs,” IEEE Trans. Dependable and Secure Computing, vol. 3, no. 4, pp. 289-300, Oct.-Dec. 2006. [5] Y.H. Cho and W.H. Mangione-Smith, “Fast Reconfiguring Deep Packet Filter for $1{+}$ Gigabit Network,” Proc. IEEE Symp. Field-Programmable Custom Computing Machines, 2005. [6] ClamAV anti-virus software, http:/www.clamav.net, 2011. [7] C.R. Clark and D.E. Schimmel, “Efficient Reconfigurable Logic Circuits for Matching Complex Network Intrusion Detection Patterns,” Proc. Int'l Conf. Field-Programmable Logic and Applications, pp. 956-959, 2005. [8] S. Dharmapurikar, P. Krishnamurthy, T.S. Sproull, and J.W. Lockwood, “Deep Packet Inspection Using Parallel Bloom Filters,” IEEE Micro, vol. 24, no. 1, pp. 52-61, Jan.-Feb. 2004. [9] S. Dharmapurikar and J. Lockwood, “Fast and Scalable Pattern Matching for Network Intrusion Detection Systems,” IEEE J. Selected Areas in Comm., vol. 24, no. 10, pp. 1781-1792, Oct. 2006. [10] V. Dimopoulos, I. Papaefstathiou, and D. Pnevmatikatos, “A Memory-Efficient Reconfigurable Aho-Corasick FSM Implementation for Intrusion Detection Systems,” Proc. IEEE Int'l Conf. Embedded Computer Systems: Architectures, Modeling and Simulations, pp. 186-193, 2007. [11] D. Ficara, S. Giordano, S. Kumar, and B. Lynch, “Divide and Discriminate: Algorithm for Deterministic and Fast Hash Lookups,” Proc. ACM/IEEE Symp. Architectures for Networking and Comm. Systems (ANCS), pp. 133-142, 2009. [12] J.T.L. Ho and G.F. Lemieux, “PERG: A Scalable FPGA-Based Pattern-Matching Engine with Consolidated Bloomier Filters,” Proc. IEEE Int'l Conf. Field-Programmable Technology, pp. 73-80, Dec. 2008. [13] N. Hua, H. Song, and T.V. Lakshman, “Variable-Stride Multi-Pattern Matching for Scalable Deep Packet Inspection,” Proc. IEEE INFOCOM, pp. 415-423, 2009. [14] D. Knuth, J. Morris, and V. Pratt, “Fast Pattern Matching in Strings,” SIAM J. Computing, vol. 6, pp. 323-350, 1977. [15] J. van Junteren, “Searching Very Large Routing Tables in Wide Embedded Memory,” Proc. IEEE Global Telecomm. Conf. (GlobeCom), pp. 1615-1619, 2001. [16] J. van Lunteren, “High-Performance Pattern-Matching for Intrusion Detection,” Proc. IEEE INFOCOM, pp. 1-13, 2006. [17] D. Pao, W. Lin, and B. Liu, “Pipelined Architecture for Multi-String Matching,” IEEE Computer Architecture Letters, vol. 7, no. 2, pp. 33-36, July-Dec. 2008. [18] D. Pao, W. Lin, and B. Liu, “A Memory Efficient Pipelined Implementation of the Aho-Corasick String Matching Algorithm,” ACM Trans. Architecture and Code Optimization, vol. 7, no. 2, pp. 1-27, 2010. [19] D. Pao, “A NFA-Based Programmable Regular Expression Match Engine,” Proc. ACM/IEEE Symp. Architectures for Networking and Comm. Systems (ANCS), pp. 60-61, 2009. [20] G. Papadopoulos and D. Pnevmatikatos, “Hashing + Memory = Low Cost, Exact Pattern Matching,” Proc. IEEE Int'l Conf. Field Programmable Logic and Applications, pp. 39-44, 2005. [21] Snort intrusion detection system, http:/www.snort.org, 2011. [22] T. Song, W. Zhang, D. Wang, and Y. Xue, “A Memory Efficient Multiple Pattern Matching Architecture for Network Security,” Proc. IEEE INFOCOM, pp. 673-681, 2008. [23] I. Sourdis and V. Pnevmatikatos, “Pre-Decoded CAMs for Efficient and High-Speed NIDS Pattern Matching,” Proc. 12th IEEE Symp. Field-Programmable Custom Computing Machines, 2004. [24] I. Sourdis, V. Pnevmatikatos, and S. Vassiliadis, “Scalable Multigigabit Pattern Matching for Packet Inspection,” IEEE Trans. Very Large Scale Integration (VLSI) Systems, vol. 16, no. 2, pp. 156-166, Feb. 2008. [25] L. Tan, B. Brotherton, and T. Sherwood, “Bit-Split String-Matching Engines for Intrusion Detection and Prevention,” ACM Trans. Architecture and Code Optimization, vol. 3, no. 1, pp. 3-34, 2006. [26] N. Tuck, T. Sherwood, B. Calder, and G. Varghese, “Deterministic Memory-Efficient String Matching Algorithms for Intrusion Detection,” Proc. IEEE INFOCOM, pp. 2628-2639, 2004. [27] F. Yu, R.H. Katz, and T.V. Lakshman, “Gigabit Rate Packet Pattern-Matching Using TCAM,” Proc. IEEE Int'l Conf. Network Protocols, pp. 174-183, 2004.