The Community for Technology Leaders
2011 27th IEEE International Conference on Software Maintenance (ICSM) (2011)
Williamsburg, VA, USA
Sept. 25, 2011 to Sept. 30, 2011
ISBN: 978-1-4577-0663-9
pp: 283-292
Joel Ossher , Donald Bren School of Information and Computer Sciences, University of California, Irvine, 92697-3425, USA
Hitesh Sajnani , Donald Bren School of Information and Computer Sciences, University of California, Irvine, 92697-3425, USA
Cristina Lopes , Donald Bren School of Information and Computer Sciences, University of California, Irvine, 92697-3425, USA
ABSTRACT
We present a study of the extent to which developers copy entire files or sets of files into their applications with little or no modification. Our aim is to determine the prevalence of such activity within open source Java development, and to identify the circumstances under which files are reused in this manner. To accomplish this aim, we developed a novel method of file-level code clone detection that is scalable to millions of files. We applied our method to the Sourcerer Repository, which contains over 13,000 Java projects aggregated from multiple open source repositories. Our method detected that in excess of 10% of files are clones, and that over 15% of all projects contain at least one cloned file. In addition to computing these raw numbers, we manually examined a large number of the reported clones. We found the most commonly cloned files to be Java extension classes and popular third-party libraries, both large and small. We also discovered a number of projects that occur in multiple online repositories, have been forked, or were divided into multiple subprojects.
INDEX TERMS
CITATION

H. Sajnani, C. Lopes and J. Ossher, "File cloning in open source Java projects: The good, the bad, and the ugly," 2011 27th IEEE International Conference on Software Maintenance (ICSM), Williamsburg, VA, USA, 2011, pp. 283-292.
doi:10.1109/ICSM.2011.6080795
165 ms
(Ver 3.3 (11022016))