2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE) (2017)
Urbana, IL, USA
Oct. 30, 2017 to Nov. 3, 2017
Patrice Godefroid , Microsoft Research, USA
Hila Peleg , Technion, Israel
Rishabh Singh , Microsoft Research, USA
Fuzzing consists of repeatedly testing an application with modified, or fuzzed, inputs with the goal of finding security vulnerabilities in input-parsing code. In this paper, we show how to automate the generation of an input grammar suitable for input fuzzing using sample inputs and neural-network-based statistical machine-learning techniques. We present a detailed case study with a complex input format, namely PDF, and a large complex security-critical parser for this format, namely, the PDF parser embedded in Microsoft's new Edge browser. We discuss and measure the tension between conflicting learning and fuzzing goals: learning wants to capture the structure of well-formed inputs, while fuzzing wants to break that structure in order to cover unexpected code paths and find bugs. We also present a new algorithm for this learn&fuzz challenge which uses a learnt input probability distribution to intelligently guide where to fuzz inputs.
Portable document format, Grammar, Training, Probability distribution, Recurrent neural networks
P. Godefroid, H. Peleg and R. Singh, "Learn&Fuzz: Machine learning for input fuzzing," 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE), Urbana, IL, USA, 2017, pp. 50-59.