Issue No. 04 - April (2011 vol. 60)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TC.2010.95
Yuan-Cheng Lai , National Taiwan University of Science and Technology, Taipei, Taiwan
Po-Ching Lin , National Chung Cheng University, Chiayi, Taiwan
Ying-Dar Lin , National Chiao Tung University, Hsinchu, Taiwan
Virus scanning involves computationally intensive string matching against a large number of signatures of different characteristics. Matching a variety of signatures challenges the selection of matching algorithms, as each approach has better performance than others for different signature characteristics. We propose a hybrid approach that partitions the signatures into long and short ones in the open-source ClamAV for virus scanning. An algorithm enhanced from the Wu-Manber algorithm, namely the Backward Hashing algorithm, is responsible for only long patterns to lengthen the average skip distance, while the Aho-Corasick algorithm scans for only short patterns to reduce the automaton sizes. The former utilizes the bad-block heuristic to exploit long shift distance and reduce the verification frequency, so it is much faster than the original WM implementation in ClamAV. The latter increases the AC performance by around 50 percent due to better cache locality. We also rank the factors to indicate their importance for the string matching performance.
String matching, automaton, filtering, virus scanning.
Yuan-Cheng Lai, Po-Ching Lin, Ying-Dar Lin, "A Hybrid Algorithm of Backward Hashing and Automaton Tracking for Virus Scanning", IEEE Transactions on Computers, vol. 60, no. , pp. 594-601, April 2011, doi:10.1109/TC.2010.95