The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.04 - July/August (2009 vol.35)
pp: 497-514
Hamid Abdul Basit , Lahore University of Management Sciences, Lahore
Stan Jarzabek , National University of Singapore, Singapore
ABSTRACT
Code clones are similar program structures recurring in variant forms in software system(s). Several techniques have been proposed to detect similar code fragments in software, so-called simple clones. Identification and subsequent unification of simple clones is beneficial in software maintenance. Even further gains can be obtained by elevating the level of code clone analysis. We observed that recurring patterns of simple clones often indicate the presence of interesting higher-level similarities that we call structural clones. Structural clones show a bigger picture of similarity situation than simple clones alone. Being logical groups of simple clones, structural clones alleviate the problem of huge number of clones typically reported by simple clone detection tools, a problem that is often dealt with postdetection visualization techniques. Detection of structural clones can help in understanding the design of the system for better maintenance and in reengineering for reuse, among other uses. In this paper, we propose a technique to detect some useful types of structural clones. The novelty of our approach includes the formulation of the structural clone concept and the application of data mining techniques to detect these higher-level similarities. We describe a tool called Clone Miner that implements our proposed technique. We assess the usefulness and scalability of the proposed techniques via several case studies. We discuss various usage scenarios to demonstrate in what ways the knowledge of structural clones adds value to the analysis based on simple clones alone.
INDEX TERMS
Design concepts, maintainability, restructuring, reverse engineering, reengineering, reusable software.
CITATION
Hamid Abdul Basit, Stan Jarzabek, "A Data Mining Approach for Detecting Higher-Level Clones in Software", IEEE Transactions on Software Engineering, vol.35, no. 4, pp. 497-514, July/August 2009, doi:10.1109/TSE.2009.16
REFERENCES
[1] G. Ammons, R. Bodik, and J.R. Larus, “Mining Specifications,” Proc. 29th ACM SIGPLAN-SIGACT Symp. Principles of Programming Languages, pp. 4-16, 2002.
[2] B.S. Baker, “On Finding Duplication and Near-Duplication in Large Software Systems,” Proc. Second Working Conf. Reverse Eng., pp. 86-95, 1995.
[3] M. Balazinska, E. Merlo, M. Dagenais, B. Lagüe, and K. Kontogiannis, “Partial Redesign of Java Software Systems Based on Clone Analysis,” Proc. Sixth Working Conf. Reverse Eng., pp. 326-336, 1999.
[4] M. Balazinska, E. Merlo, M. Dagenais, B. Lagüe, and K. Kontogiannis, “Advanced Clone-Analysis to Support Object-Oriented System Refactoring,” Proc. Seventh Working Conf. Reverse Eng., pp. 98-107, 2000.
[5] H.A. Basit, D.C. Rajapakse, and S. Jarzabek, “Beyond Templates: A Study of Clones in the STL and Some General Implications,” Proc. 28th Int'l Conf. Software Eng., pp. 451-459, May 2005.
[6] H.A. Basit, S. Puglisi, W. Smyth, A. Turpin, and S. Jarzabek, “Efficient Token Based Clone Detection with Flexible Tokenization,” Proc. European Software Eng. Conf. and ACM SIGSOFT Symp. Foundations of Software Eng., pp. 513-516, Sept. 2007.
[7] H.A. Basit and S. Jarzabek, “Detecting Higher-Level Similarity Patterns in Programs,” Proc. European Software Eng. Conf. and ACM SIGSOFT Symp. Foundations of Software Eng., pp. 156-165, Sept. 2005.
[8] D. Batory, V. Singhai, M. Sirkin, and J. Thomas, “Scalable Software Libraries,” Proc. ACM SIGSOFT Symp. Foundations of Software Eng., pp. 191-199, Dec. 1993.
[9] I.D. Baxter, A. Yahin, L. Moura, M.S. Anna, and L. Bier, “Clone Detection Using Abstract Syntax Trees,” Proc. IEEE Int'l Conf. Software Maintenance, pp. 368-377, 1998.
[10] T.J. Biggerstaff, “Design Recovery for Maintenance and Reuse,” Computer, vol. 22, no. 7, pp. 36-49, July 1989.
[11] E. Buss, R.D. Mori, W. Gentleman, J. Henshaw, H. Johnson, K. Kontogiannis, E. Merlo, H. Muller, J.M.S. Paul, A. Prakash, M. Stanley, S. Tilley, J. Troster, and K. Wong, “Investigating Reverse Engineering Technologies for the CAS Program Understanding Project,” IBM Systems J., vol. 33, no. 3, pp. 477-500, 1994.
[12] P. Clements and L. Northrop, Software Product Lines: Practices and Patterns. Addison-Wesley, 2002.
[13] J.R. Cordy, “Comprehending Reality: Practical Barriers to Industrial Adoption of Software Maintenance Automation,” Proc. 11th IEEE Int'l Workshop Program Comprehension, (keynote paper), pp. 196-206, 2003.
[14] A. De Lucia, R. Francese, G. Scanniello, and G. Tortora, “Reengineering Web Applications Based on Cloned Pattern Analysis,” Proc. 12th Int'l Workshop Program Comprehension, pp.132-141, 2004.
[15] A. De Lucia, G. Scanniello, and G. Tortora, “Identifying Clones in Dynamic Web Sites Using Similarity Thresholds,” Proc. Int'l Conf. Enterprise Information Systems, pp. 391-396, 2004.
[16] S. Ducasse, M. Rieger, and S. Demeyer, “A Language Independent Approach for Detecting Duplicated Code,” Proc. IEEE Int'l Conf. Software Maintenance, pp. 109-118, 1999.
[17] M. Fowler, Analysis Patterns: Reusable Object Models. Addison-Wesley, 1997.
[18] M. Fowler, Refactoring—Improving the Design of Existing Code. Addison-Wesley, 1999.
[19] E. Gamma, R. Helm, R. Johnson, and J. Vlissides, Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley, 1997.
[20] J.Y. Gil and I. Maman, “Micro Patterns in Java Code,” Proc. 20th Object Oriented Programming Systems Languages and Applications, pp. 97-116, 2005.
[21] G. Grahne and J. Zhu, “Efficiently Using Prefix-Trees in Mining Frequent Itemsets,” Proc. First IEEE ICDM Workshop Frequent Itemset Mining Implementations, Nov. 2003.
[22] J. Han and M. Kamber, Data Mining: Concepts and Techniques. Morgan Kaufman Publishers, 2001.
[23] J. Hartman, “Technical Introduction to the First Workshop Artificial Intelligence and Automated Program Understanding,” Proc. Workshop Notes of the AAAI-92 Workshop Program: AI & Automated Program Understanding, pp. 8-30, July 1992.
[24] Y. Higo, T. Kamiya, S. Kusumoto, and K. Inoue, “ARIES: Refactoring Support Environment Based on Code Clone Analysis,” Proc. Eighth IASTED Int'l Conf. Software Eng. and Applications, pp. 222-229, Nov. 2004.
[25] S. Jarzabek, Effective Software Maintenance and Evolution: Reused-Based Approach. CRC Press, Taylor and Francis, 2007.
[26] S. Jarzabek and S. Li, “Eliminating Redundancies with a `Composition with Adaptation' Meta-Programming Technique,” Proc. European Software Eng. Conf. and ACM SIGSOFT Symp. Foundations of Software Eng., pp. 237-246, Sept. 2003.
[27] S. Jarzabek and S. Li, “Unifying Clones with a Generative Programming Technique: A Case Study,” J. Software Maintenance and Evolution: Research and Practice, vol. 18, no. 4, pp. 267-292, July 2006.
[28] T. Kamiya, S. Kusumoto, and K. Inoue, “CCFinder: A Multi-Linguistic Token-Based Code Clone Detection System for Large Scale Source Code,” IEEE Trans. Software Eng., vol. 28, no. 7, pp.654-670, July 2002.
[29] C. Kapser and M.W. Godfrey, “Toward a Taxonomy of Clones in Source Code: A Case Study,” Proc. Int'l Workshop Evolution of Large Scale Industrial Software Architectures, pp. 67-78, 2003.
[30] C. Kapser and M.W. Godfrey, “Improved Tool Support for the Investigation of Duplication in Software,” Proc. IEEE Int'l Conf. Software Maintenance, pp. 305-314, Sept. 2005.
[31] G. Kiczales, J. Lamping, A. Mendhekar, C. Maeda, C. Lopes, J.-M. Loingtier, and J. Irwin, “Aspect-Oriented Programming,” Proc. European Conf. Object-Oriented Programming, pp. 220-242, 1997.
[32] M. Kim, L. Bergman, T. Lau, and D. Notkin, “An Ethnographic Study of Copy and Paste Programming Practices in OOPL,” Proc. Int'l Symp. Empirical Software Eng., pp. 83-92, 2004.
[33] R. Komondoor and S. Horwitz, “Using Slicing to Identify Duplication in Source Code,” Proc. Eighth Int'l Symp. Static Analysis, pp. 40-56, 2001.
[34] R. Koschke, R. Falke, and P. Frenzel, “Clone Detection Using Abstract Syntax Suffix Trees,” Proc. 13th Working Conf. Reverse Eng., pp. 253-262, 2006.
[35] W. Kozaczynski, J. Ning, and A. Engberts, “Program Concept Recognition and Transformation,” IEEE Trans. Software Eng., vol. 18, no. 12, pp. 1065-1075, Dec. 1992.
[36] J. Krinke, “Identifying Similar Code with Program Dependence Graphs,” Proc. Eighth Working Conf. Reverse Eng., pp. 301-309, Oct. 2001.
[37] Z. Li and Y. Zhou, “PR-Miner: Automatically Extracting Implicit Programming Rules and Detecting Violations in Large Software Code,” ACM SIGSOFT Software Eng. Notes, vol. 30, no. 5, pp. 306-315, Sept. 2005.
[38] Z. Li, S. Lu, S. Myagmar, and Y. Zhou, “CP-Miner: Finding Copy-Paste and Related Bugs in Large-Scale Software Code,” IEEE Trans. Software Eng., vol. 32, no. 3, pp. 176-192, Mar. 2006.
[39] A. Marcus and J.I. Maletic, “Identification of High-Level Concept Clones in Source Code,” Proc. Int'l Conf. Automated Software Eng., pp. 107-114, 2001.
[40] J. Mayrand, C. Leblanc, and E. Merlo, “Experiment on the Automatic Detection of Function Clones in a Software System Using Metrics,” Proc. IEEE Int'l Conf. Software Maintenance, pp.244-254, 1996.
[41] D. Parnas, “Software Aging,” Proc. 16th Int'l Conf. Software Eng., pp. 279-287, 1994.
[42] U. Pettersson and S. Jarzabek, “Industrial Experience with Building a Web Portal Product Line Using a Lightweight, Reactive Approach,” Proc. European Software Eng. Conf. and ACM SIGSOFT Int'l Symp. Foundations of Software Eng., pp. 326-335, Sept. 2005.
[43] D.C. Rajapakse and S. Jarzabek, “Using Server Pages to Unify Clones in Web Applications: A Trade-off Analysis,” Proc. Int'l Conf. Software Eng., May 2007.
[44] C. Rich and R.C. Waters, The Programmer's Apprentice. ACM Press, Addison-Wesley, 1990.
[45] C. Rich and L.M. Wills, “Recognizing a Program's Design: A Graph-Parsing Approach,” IEEE Software, vol. 7, no. 1, pp. 82-89, Jan. 1990.
[46] M. Rieger, “Effective Clone Detection without Language Barriers,” PhD thesis, Univ. of Bern, 2005.
[47] N. Shi and R.A. Olsson, “Reverse Engineering of Design Patterns from Java Source Code,” Proc. 21st IEEE/ACM Int'l Conf. Automated Software Eng., pp. 123-134, Sept. 2006.
[48] I. Sommerville, Software Engineering, fifth ed. Addison-Wesley, 1996.
[49] Y. Ueda, T. Kamiya, S. Kusumoto, and K. Inoue, “Gemini: Maintenance Support Environment Based on Code Clone Analysis,” Proc. Eighth IEEE Symp. Software Metrics, pp. 67-76, 2002.
[50] Y. Ueda, T. Kamiya, S. Kusumoto, and K. Inoue, “On Detection of Gapped Code Clones Using Gap Locations,” Proc. IEEE Ninth Asia-Pasific Software Eng. Conf., pp. 327-336, 2002.
[51] A. Walenstein, A. Lakhotia, and R. Koschke, “The Second International Workshop Detection of Software Clones: Workshop Report,” SIGSOFT Software Eng. Notes, vol. 29, no. 2, pp. 1-5, Mar. 2004.
[52] Y. Zhang, H. Basit, S. Jarzabek, D. Anh, and M. Low, “Query-Based Filtering and Graphical View Generation for Clone Analysis,” Proc. 24th IEEE Int'l Conf. Software Maintenance, Sept. 2008.
[53] J. Yang and S. Jarzabek, “Applying a Generative Technique for Enhanced Reuse on J2EE Platform,” Proc. Fourth Int'l Conf. Generative Programming and Component Eng., pp. 237-255, 2005.
25 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool