Seventh International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC'05) (2005)
Sept. 25, 2005 to Sept. 29, 2005
Richard Wettel , Institute e-Austria Timişoara
Radu Marinescu , Institute e-Austria Timişoara
Code duplication is a common problem, and a well-known sign of bad design. As a result of that, in the last decade, the issue of detecting code duplication led to various solutions and tools that can automatically find duplicated blocks of code. However, duplicated fragments rarely remain identical after they are copied; they are oftentimes modified here and there. This adaptation usually "scatters" the duplicated code block into a large amount of small "islands" of duplication, which detected and analyzed separately hide the real magnitude and impact of the duplicated block. In this paper we propose a novel, automated approach for recovering duplication blocks, by composing small isolated fragments of duplication into larger and more relevant duplication chains. We validate both the efficiency and the scalability of the approach by applying it on several well known open-source case-studies and discussing some relevant findings. By recovering such duplication chains, the maintenance engineer is provided with additional cases of duplication that can lead to relevant refactorings, and which are usually missed by other detection methods.
code duplication, design flaws, quality assurance
R. Wettel and R. Marinescu, "Archeology of Code Duplication: Recovering Duplication Chains from Small Duplication Fragments," Seventh International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC'05)(SYNASC), Timisoara, Romania, 2005, pp. 63-70.