The Community for Technology Leaders
2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE) (2017)
Urbana, IL, USA
Oct. 30, 2017 to Nov. 3, 2017
ISBN: 978-1-5386-3976-4
pp: 319-330
Shuai Wang , The Pennsylvania State University, University Park, PA 16802, USA
Dinghao Wu , The Pennsylvania State University, University Park, PA 16802, USA
ABSTRACT
Detecting similar functions in binary executables serves as a foundation for many binary code analysis and reuse tasks. By far, recognizing similar components in binary code remains a challenge. Existing research employs either static or dynamic approaches to capture program syntax or semantics-level features for comparison. However, there exist multiple design limitations in previous work, which result in relatively high cost, low accuracy and scalability, and thus severely impede their practical use. In this paper, we present a novel method that leverages in-memory fuzzing for binary code similarity analysis. Our prototype tool IMF-SIM applies in-memory fuzzing to launch analysis towards every function and collect traces of different kinds of program behaviors. The similarity score of two behavior traces is computed according to their longest common subsequence. To compare two functions, a feature vector is generated, whose elements are the similarity scores of the behavior trace-level comparisons. We train a machine learning model through labeled feature vectors; later, for a given feature vector by comparing two functions, the trained model gives a final score, representing the similarity score of the two functions. We evaluate IMF-SIM against binaries compiled by different compilers, optimizations, and commonly-used obfuscation methods, in total over one thousand binary executables. Our evaluation shows that IMF-SIM notably outperforms existing tools with higher accuracy and broader application scopes.
INDEX TERMS
Binary codes, Tools, Runtime, Indexes, Syntactics
CITATION

S. Wang and D. Wu, "In-memory fuzzing for binary code similarity analysis," 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE), Urbana, IL, USA, 2017, pp. 319-330.
doi:10.1109/ASE.2017.8115645
320 ms
(Ver 3.3 (11022016))