The Community for Technology Leaders
2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE) (2017)
Urbana, IL, USA
Oct. 30, 2017 to Nov. 3, 2017
ISBN: 978-1-5386-3976-4
pp: 50-59
Patrice Godefroid , Microsoft Research, USA
Hila Peleg , Technion, Israel
Rishabh Singh , Microsoft Research, USA
ABSTRACT
Fuzzing consists of repeatedly testing an application with modified, or fuzzed, inputs with the goal of finding security vulnerabilities in input-parsing code. In this paper, we show how to automate the generation of an input grammar suitable for input fuzzing using sample inputs and neural-network-based statistical machine-learning techniques. We present a detailed case study with a complex input format, namely PDF, and a large complex security-critical parser for this format, namely, the PDF parser embedded in Microsoft's new Edge browser. We discuss and measure the tension between conflicting learning and fuzzing goals: learning wants to capture the structure of well-formed inputs, while fuzzing wants to break that structure in order to cover unexpected code paths and find bugs. We also present a new algorithm for this learn&fuzz challenge which uses a learnt input probability distribution to intelligently guide where to fuzz inputs.
INDEX TERMS
Portable document format, Grammar, Training, Probability distribution, Recurrent neural networks
CITATION

P. Godefroid, H. Peleg and R. Singh, "Learn&Fuzz: Machine learning for input fuzzing," 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE), Urbana, IL, USA, 2017, pp. 50-59.
doi:10.1109/ASE.2017.8115618
171 ms
(Ver 3.3 (11022016))