This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Finding Clones with Dup: Analysis of an Experiment
September 2007 (vol. 33 no. 9)
pp. 608-621
An experiment was carried out by a group of scientists to compare different tools and techniques for detecting duplicated or near-duplicated source code. The overall comparative results are presented elsewhere. This paper takes a closer lookat the results for one tool, Dup, which finds code sections that are textually the same or the same except for systematic substitution of parameters such as identifiers and constants. Varous factors that influenced the results are identified and their impact on the results is assessed via rerunning Dup with changed options and modifications. These improve the performance of Dup with regard to the experiment, and could be incorporated into a postprocessor to be used with other tools.

[1] B.S. Baker, “A Theory of Parameterized Pattern Matching: Algorithms and Applications,” Proc. 25th ACM Symp. Theory of Computing, pp. 71-80, May 1993.
[2] B.S. Baker, “On Finding Duplication and Near-Duplication in Large Software Systems,” Proc. Second IEEE Working Conf. Reverse Eng., pp. 86-95, July 1995.
[3] B.S. Baker, “Parameterized Pattern Matching: Algorithms and Applications,” J. Computer and System Sciences, vol. 52, no. 1, pp.28-42, Feb. 1996.
[4] B.S. Baker, “Parameterized Duplication in Strings: Algorithms and an Application to Software Maintenance,” SIAM J. Computing, vol. 26, no. 5, pp. 1343-1362, Oct. 1997.
[5] I. Baxter, A. Yahin, L. Moura, M. Sant'Anna, and L. Bier, “Clone Detection Using Abstract Syntax Trees,” Proc. Int'l Conf. Software Maintenance, pp. 368-377, 1998.
[6] T. Kamiya, S. Kusumoto, and K. Inoue, “CCFinder: A Multi-Linguistic Token-Based Code Clone Detection System for Large-Scale Source Code,” IEEE Trans. Software Eng., vol. 28, no. 7, pp.654-670, July 2002.
[7] J. Krinke, “Identifying Similar Code with Program Dependence Graphs,” Proc. Eighth Working Conf. Reverse Eng. (WCRE '01), pp.301-309, 2001.
[8] B. Lague, D. Proulx, J. Mayrand, E. Merlo, and J. Hudepohl, “Assessing the Benefits of Incorporating Function Clone Detection in a Development Process,” Proc. Int'l Conf. Software Maintenance, pp. 314-321, 1997.
[9] J. Mayrand, C. Leblanc, and E. Merlo, “Experiment on the Automatic Detection of Function Clones in a Software System Using Metrics,” Proc. Int'l Conf. Software Maintenance, pp. 244-253, 1996.
[10] S. Ducasse, M. Rieger, and S. Demeyer, “A Language Independent Approach for Detecting Duplicated Code,” Proc. Int'l Conf. Software Maintenance (ICSM '99), pp. 109-118, 1999.
[11] S. Bellon, R. Koschke, G. Antoniol, J. Krinke, and E. Merlo, “Comparison and Evaluation of Clone Detection Tools,” IEEE Trans. Software Eng., to appear.
[12] S. Bellon, “Detection of Software Clones,” http://www.bauhaus-stuttgart.declones, 2004.
[13] E. McCreight, “A Space-Economical Suffix Tree Construction Algorithm,” J. ACM, vol. 23, no. 2, pp. 262-272, 1976.
[14] M. Crochemore and W. Rytter, Jewels of Stringology. World Scientific, 2003.
[15] A. Amir, M. Farach, and S. Muthukrishnan, “Alphabet Dependence in Parameterized Matching,” Information Processing Letters, vol. 49, pp. 111-115, 1994.
[16] R.M. Idury and A.A. Schaffer, “Multiple Matching of Parameterized Patterns,” Proc. Fifth Ann. Symp. Combinatorial Pattern Matching (CPM '94), M. Crochemore and D. Gusfield, eds., pp.226-239, June 1994.
[17] S.R. Kosaraju, “Faster Algorithms for the Construction of Parameterized Suffix Trees,” Proc. 36th Ann. Symp. Foundations of Computer Science (FOCS '95), pp. 631-639, 1995.
[18] B. Kernighan, personal comm., 1991.

Index Terms:
Redundant code, duplicated code, softwareclones
Citation:
Brenda S. Baker, "Finding Clones with Dup: Analysis of an Experiment," IEEE Transactions on Software Engineering, vol. 33, no. 9, pp. 608-621, Sept. 2007, doi:10.1109/TSE.2007.70720
Usage of this product signifies your acceptance of the Terms of Use.