This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Comparison and Evaluation of Clone Detection Tools
September 2007 (vol. 33 no. 9)
pp. 577-591
Rainer Koschke, IEEE Computer Society
Jens Krinke, IEEE Computer Society
Many techniques for detecting duplicated source code (software clones) have been proposed in the past. However, it is not yet clear how these techniques compare in terms of recall and precision as well as space and time requirements. This paper presents an experiment that evaluates six clone detectors based on eight large C and Java programs (altogether almost 850 KLOC). Their clone candidates were evaluated by one of the authors as independent third party. The selected techniques cover the whole spectrum of the state-of-the-art in clone detection. The techniques work on text, lexical and syntactic information, software metrics, and program dependency graphs.

[1] B.S. Baker, “On Finding Duplication and Near-Duplication in Large Software Systems,” Proc. Second Working Conf. Reverse Eng., L.Wills, P. Newcomb, and E. Chikofsky, eds., pp. 86-95, July 1995.
[2] K. Kontogiannis, R.D. Mori, E. Merlo, M. Galler, and M. Bernstein, “Pattern Matching for Clone and Concept Detection,” Automated Software Eng., vol. 3, nos. 1-2, pp. 79-108, June 1996.
[3] B. Laguë, D. Proulx, J. Mayrand, E.M. Merlo, and J. Hudepohl, “Assessing the Benefits of Incorporating Function Clone Detection in a Development Process,” Proc. Int'l Conf. Software Maintenance, pp.314-321, 1997.
[4] S. Ducasse, M. Rieger, and S. Demeyer, “A Language Independent Approach for Detecting Duplicated Code,” Proc. Int'l Conf. Software Maintenance (ICSM '99), 1999.
[5] J.H. Johnson, “Visualizing Textual Redundancy in Legacy Source,” Proc. Int'l Conf. Computer Science and Software Eng. (CASCON '94), p.32, 1994.
[6] B.S. Baker, “A Program for Identifying Duplicated Code,” Proc. 24th Symp. Interface, pp. 49-57, Mar. 1992.
[7] B.S. Baker, “Parameterized Pattern Matching: Algorithms and Applications,” J. Computer System Science, vol. 52, no. 1, pp. 28-42, Feb. 1996.
[8] I.D. Baxter, A. Yahin, L. Moura, M. Sant'Anna, and L. Bier, “Clone Detection Using Abstract Syntax Trees,” Proc. Int'l Conf. Software Maintenance, 1998.
[9] T. Kamiya, S. Kusumoto, and K. Inoue, “CCFinder: A Multi-Linguistic Token-Based Code Clone Detection System for Large Scale Source Code,” IEEE Trans. Software Eng., vol. 28, no. 7, pp.654-670, July 2002.
[10] J. Krinke, “Identifying Similar Code with Program Dependence Graphs,” Proc. Eighth Working Conf. Reverse Eng. (WCRE' 01), 2001.
[11] R. Komondoor and S. Horwitz, “Using Slicing to Identify Duplication in Source Code,” Proc. Int'l Symp. Static Analysis, pp.40-56, July 2001.
[12] J. Mayrand, C. Leblanc, and E.M. Merlo, “Experiment on the Automatic Detection of Function Clones in a Software System Using Metrics,” Proc. Int'l Conf. Software Maintenance, pp. 244-254, Nov. 1996.
[13] J.H. Johnson, “Identifying Redundancy in Source Code Using Fingerprints,” Proc. Int'l Conf. Computer Science and Software Eng. (CASCON '93), pp. 171-183, 1993.
[14] R.M. Karp and M.O. Rabin, “Efficient Randomized Pattern-Matching Algorithms,” IBM J. Research and Development, vol. 31, no. 2, pp. 249-260, Mar. 1987.
[15] E. McCreight, “A Space-Economical Suffix Tree Construction Algorithm,” J. ACM, vol. 32, no. 2, pp. 262-272, 1976.
[16] J.R. Cordy, T.R. Dean, and N. Synytskyy, “Practical Language-Independent Detection of Near-Miss Clones,” Proc. Int'l Conf. Computer Science and Software Eng. (CASCON '04), pp. 1-12, 2004.
[17] D. Gitchell and N. Tran, “Sim: A Utility for Detecting Similarity in Computer Programs,” Proc. 30th SIGCSE Technical Symp. Computer Science Education, pp. 266-270, 1999.
[18] Y. Higo, Y. Ueda, T. Kamiya, S. Kusumoto, and K. Inoue, “On Software Maintenance Process Improvement Based on Code Clone Analysis,” Proc. Int'l Conf. Product Focused Software Process Improvement, pp. 185-197, 2002.
[19] K. Kontogiannis, R. DeMori, M. Bernstein, M. Galler, and E. Merlo, “Pattern Matching for Design Concept Localization,” Proc. Second Working Conf. Reverse Eng., (WCRE '95), pp. 96-103, July 1995.
[20] G. DiLucca, M. DiPenta, and A. Fasolino, “An Approach to Identify Duplicated Web Pages,” Proc. Int'l Computer Software and Applications Conf. (COMPSAC '02), pp. 481-486, 2002.
[21] F. Lanubile and T. Mallardo, “Finding Function Clones in Web Applications,” Proc. Conf. Software Maintenance and Reeng., pp. 379-386, 2003.
[22] W. Yang, “Identifying Syntactic Differences Between Two Programs,” Software—Practice and Experience, vol. 21, no. 7, pp.739-755, July 1991.
[23] A. Marcus and J. Maletic, “Identification of High-Level Concept Clones in Source Code,” Proc. Int'l Conf. Automated Software Eng., pp.107-114, 2001.
[24] A.M. Leitao, “Detection of Redundant Code Using R2D2,” Proc. Workshop Source Code Analysis and Manipulation, pp. 183-192, 2003.
[25] V. Wahler, D. Seipel, J.W. von Gudenberg, and G. Fischer, “Clone Detection in Source Code by Frequent Itemset Techniques,” Proc. Workshop Source Code Analysis and Manipulation, pp. 128-135, 2004.
[26] Z. Li, S. Lu, S. Myagmar, and Y. Zhou, “CP-Miner: A Tool for Finding Copy-Paste and Related Bugs in Operating System Code,” Operating System Design and Implementation, pp. 289-302, 2004.
[27] “Cook,” http://miller.emu.id.au/pmiller/software cook/, 2007.
[28] “The Stuttgart Neuronal Network Simulator,” http:/www-ra. informatik.uni-tuebingen.de , 2007.
[29] “PostgreSQL,” http:/www.postgresql.org, 2007.
[30] “Javadoc, http:/javadoc.netbeans.org, 2007.
[31] “Eclipse,” http:/www.eclipse.org, 2007.
[32] “Java 2 SDK,” http:/java.sun.com, 2007.
[33] S. Bellon, “Vergleich von Techniken zur Erkennung duplizierten Quellcodes,” master's thesis no. 1998, Universität Stuttgart, Germany, 2002.
[34] S. Bellon, “Detection of Software Clones—Tool Comparison Experiment,” http://www.bauhaus-stuttgart.declones, 2007.
[35] S. Ducasse, O. Nierstrasz, and S. Demeyer, “On the Effectiveness of Clone Detection by String Matching,” J. Software Maintenance and Evolution: Research and Practice, vol. 18, no. 1, pp. 37-58, Jan. 2006.
[36] J. Bailey and E. Burd, “Evaluating Clone Detection Tools for Use during Preventative Maintenance,” Proc. Second IEEE Int'l Workshop Source Code Analysis and Manipulation (SCAM '02), pp. 36-43, Oct. 2002.
[37] L. Prechelt, G. Malpohl, and M. Philippsen, “JPlag: Finding Plagiarisms among a Set of Programs,” technical report, Univ. of Karlsruhe, Dept. of Informatics, 2000.
[38] S. Schleimer, D.S. Wilkerson, and A. Aiken, “Winnowing: Local Algorithms for Document Fingerprinting,” Proc. SIGMOD Int'l Conf. Management of Data, pp. 76-85, 2003.
[39] F. Van Rysselberghe and S. Demeyer, “Evaluating Clone Detection Techniques from a Refactoring Perspective,” Proc. Int'l Conf. Automated Software Eng., 2004.
[40] R. Koschke, R. Falke, and P. Frenzel, “Clone Detection Using Abstract Syntax Suffix Trees,” Proc. Working Conf. Reverse Eng., 2006.
[41] M. Bruntink, R. van Engelen, and T. Tourwe, “On the Use of Clone Detection for Identifying Crosscutting Concern Code,” IEEE Trans. Software Eng., vol. 31, no. 10, pp. 804-818, Oct. 2005.
[42] C. Kapser and M. Godfrey, “Improved Tool Support for the Investigation of Duplication in Software,” Proc. Int'l Conf. Software Maintenance (ICSM '05), pp. 305-314, 2005.
[43] A. Walenstein, N. Jyoti, J. Li, Y. Yang, and A. Lakhotia, “Problems Creating Task-Relevant Clone Detection Reference Data,” Proc. Working Conf. Reverse Eng., 2003.
[44] S. Bellon, “Vergleich von Techniken zur Erkennung Duplizierten Quellcodes,” master's thesis, Univ. of Stuttgart, Germany, Sept. 2002.

Index Terms:
Redundant code, duplicated code, software clones
Citation:
Stefan Bellon, Rainer Koschke, Giulio Antoniol, Jens Krinke, Ettore Merlo, "Comparison and Evaluation of Clone Detection Tools," IEEE Transactions on Software Engineering, vol. 33, no. 9, pp. 577-591, Sept. 2007, doi:10.1109/TSE.2007.70725
Usage of this product signifies your acceptance of the Terms of Use.