2016 IEEE 24th International Conference on Program Comprehension (ICPC) (2016)
Austin, TX, USA
May 16, 2016 to May 17, 2016
Fang-Hsiang Su , Columbia University, New York, USA
Jonathan Bell , Columbia University, New York, USA
Gail Kaiser , Columbia University, New York, USA
Simha Sethumadhavan , Columbia University, New York, USA
Identifying similar code in software systems can assist many software engineering tasks such as program understanding and software refactoring. While most approaches focus on identifying code that looks alike, some techniques aim at detecting code that functions alike. Detecting these functional clones - code that functions alike - in object oriented languages remains an open question because of the difficulty in exposing and comparing programs' functionality effectively. We propose a novel technique, In-Vivo Clone Detection, that detects functional clones in arbitrary programs by identifying and mining their inputs and outputs. The key insight is to use existing workloads to execute programs and then measure functional similarities between programs based on their inputs and outputs, which mitigates the problems in object oriented languages reported by prior work. We implement such technique in our system, HitoshiIO, which is open source and freely available. Our experimental results show that HitoshiIO detects more than 800 functional clones across a corpus of 118 projects. In a random sample of the detected clones, HitoshiIO achieves 68+% true positive rate with only 15% false positive rate.
Cloning, Java, Syntactics, Software systems, Complexity theory, Input variables
Fang-Hsiang Su, J. Bell, G. Kaiser and S. Sethumadhavan, "Identifying functionally similar code in complex codebases," 2016 IEEE 24th International Conference on Program Comprehension (ICPC), Austin, TX, USA, 2016, pp. 1-10.