The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.04 - April (2011 vol.60)
pp: 594-601
Ying-Dar Lin , National Chiao Tung University, Hsinchu, Taiwan
Yuan-Cheng Lai , National Taiwan University of Science and Technology, Taipei, Taiwan
ABSTRACT
Virus scanning involves computationally intensive string matching against a large number of signatures of different characteristics. Matching a variety of signatures challenges the selection of matching algorithms, as each approach has better performance than others for different signature characteristics. We propose a hybrid approach that partitions the signatures into long and short ones in the open-source ClamAV for virus scanning. An algorithm enhanced from the Wu-Manber algorithm, namely the Backward Hashing algorithm, is responsible for only long patterns to lengthen the average skip distance, while the Aho-Corasick algorithm scans for only short patterns to reduce the automaton sizes. The former utilizes the bad-block heuristic to exploit long shift distance and reduce the verification frequency, so it is much faster than the original WM implementation in ClamAV. The latter increases the AC performance by around 50 percent due to better cache locality. We also rank the factors to indicate their importance for the string matching performance.
INDEX TERMS
String matching, automaton, filtering, virus scanning.
CITATION
Ying-Dar Lin, Yuan-Cheng Lai, "A Hybrid Algorithm of Backward Hashing and Automaton Tracking for Virus Scanning", IEEE Transactions on Computers, vol.60, no. 4, pp. 594-601, April 2011, doi:10.1109/TC.2010.95
REFERENCES
[1] P.-C. Lin, Y.-D. Lin, Y.-C. Lai, and T.-H. Lee, "Using String Matching for Deep Packet Inspection," Computer, vol. 41, no. 4, pp. 23-28, Apr. 2008.
[2] F. Guo, P. Ferrie, and T. cker Chiueh, "A Study of the Packer Problem and Its Solutions," Proc. Int'l Symp. Recent Advances in Intrusion Detection (RAID), pp. 98-115, 2008.
[3] W.A. Wolf and S. McKee, "Hitting the Memory Wall: Implications of the Obvious," Computer Architecture News, vol. 23, no. 1, pp. 20-24, Mar. 1995.
[4] A.V. Aho and M.J. Corasick, "Efficient String Matching: An Aid to Bibliographic Search," Comm. ACM, vol. 18, no. 6, pp. 333-343, June 1975.
[5] S. Dharmapurikar and J.W. Lockwood, "Fast and Scalable Pattern Matching for Content Filtering," Proc. Symp. Architectures for Networking and Comm. Systems (ANCS), pp. 183-192, Oct. 2005.
[6] Y. Sugawara, M. Inaba, and K. Hiraki, "Over 10 Gbps String Matching Mechanism for Multi-Stream Packet Scanning Systems," Proc. 14th Int'l Conf. Field Programmable Logic and Applications (FPL), pp. 484-493, Sept. 2004.
[7] N. Tuck, T. Sherwood, B. Calder, and G. Varghese, "Deterministic Memory-Efficient String Matching Algorithms for Intrusion Detection," Proc. IEEE INFOCOM, pp. 333-340, Mar. 2004.
[8] M. Norton, "Optimizing Pattern Matching for Intrusion Detection," technical report, Sourcefire, Inc., http://www.snort.orgdocs, 2004.
[9] J.O. Kaphart and W.C. Arnold, "Automatic Extraction of Computer Virus Signatures," Proc. Fourth Virus Bull. Int'l Conf., pp. 178-184, Sept. 1994.
[10] O. Erdogan and P. Cao, "Hash-av: Fast Virus Signature Scanning by Cache-Resident Filters," Proc. Global Comm. Conf. (Globecom), pp. 1767-1772, Nov. 2005.
[11] S. Wu and U. Manber, "A Fast Algorithm for Multi-Pattern Searching," Technical Report TR94-17, Dept. of Computer Science, Univ. of Arizona, 1994.
[12] B.H. Bloom, "Space/Time Tradeoffs in Hash Coding with Allowable Errors," Comm. ACM, vol. 13, no. 7, pp. 422-426, July 1970.
[13] M. Crochemore and W. Rytter, Jewels on Stringology. World Scientific Publishing Company, 2002.
[14] G. Navarro and M. Raffinot, Flexible Pattern Matching in Strings: Practical On-Line Search Algorithms for Texts and Biological Sequences. Cambridge Univ. Press, 2008.
[15] R.S. Boyer and J.S. Moore, "A Fast String Matching Algorithm," Comm. ACM, vol. 20, no. 10, pp. 762-772, Oct. 1977.
[16] J. van Lunteren, "High-Performance Pattern-Matching for Intrusion Detection," Proc. IEEE INFOCOM, Apr. 2006.
[17] N.S. Artan and H.J. Chao, "Tribica: Trie Bitmap Content Analyzer for High-Speed Network Intrusion Detection," Proc. IEEE INFOCOM, May 2007.
[18] L. Tan and T. Sherwood, "Architectures for Bit-Split String Scanning in Intrusion Detection," IEEE Micro, vol. 26, no. 1, pp. 110-117, Jan. 2006.
[19] B.C. Brodie, R.K. Cytron, and D.E. Taylor, "A Scalable Architecture for High-Throughput Regular-Expression Pattern Matching," Proc. 33rd Int'l Symp. Computer Architecture (ISCA), pp. 191-202, July 2006.
[20] F. Yu, Z. Chen, Y. Diao, T.V. Lakshman, and R.H. Katz, "Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection," Proc. ACM/IEEE Symp. Architecture for Networking and Comm. Systems (ANCS), pp. 93-102, Dec. 2006.
[21] Z. Zhou, Y. Xue, J. Liu, W. Zhang, and J. Li, "MDH: A High Speed Multi-Phase Dynamic Hash String Matching Algorithm," Proc. Ninth Int'l Conf. Information and Comm. Security (ICICS), pp. 201-215, Dec. 2007.
[22] B. Xu, X. Zhou, and J. Li, "Recursive Shift Indexing: A Fast Multi-Pattern String Matching Algorithm," Proc. Fourth Int'l Conf. Applied Cryptography and Network Security (ACNS), June 2006.
[23] J. Kytöjoki, L. Salmela, and J. Tarhio, "Tuning String Matching for Huge Pattern Sets," Proc. Ann. Symp. Combinatorial Pattern Matching (CPM), pp. 211-224, June 2003.
[24] M. Fisk and G. Varghese, "Applying Fast String Matching to Intrusion Detection," Technical Report UCSD TR CS2001-0670, Univ. of California, San Diego, 2001.
[25] Y. Miretskiy, A. Das, C.P. Wright, and E. Zadok, "Avfs: An On-access Anti-Virus File System," Proc. 13th USENIX Security Symp., pp. 73-88, Aug. 2004.
[26] R.-T. Liu, N.-F. Huang, C.-N. Kao, C.-H. Chen, and C.-C. Chou, "A Fast Pattern-Match Engine for Network Processor-Based Network Intrusion Detection System," Proc. Int'l Conf. Information Technology: Coding and Computing (ITCC), pp. 97-101, Apr. 2004.
[27] Z. Galil, "On Improving the Worst Case Running Time of the Boyer-Moore String Matching Algorithm," Comm. ACM, vol. 22, no. 9, pp. 505-508, Sept. 1979.
[28] P.-C. Lin, Y.-D. Lin, Y.-J. Zheng, Y.-C. Lai, and T.-H. Lee, "Realizing a Sub-linear Time String-Matching Algorithm with a Hardware Accelerator Using Bloom Filters," IEEE Trans. VLSI Systems, vol. 17, no. 8, pp. 1008-1020, Aug. 2009.
9 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool