The Community for Technology Leaders
2013 IEEE 13th International Working Conference on Source Code Analysis and Manipulation (SCAM) (2013)
Eindhoven, Netherlands Netherlands
Sept. 22, 2013 to Sept. 23, 2013
pp: 31-36
Annervaz K M , Accenture Technology Labs, Bangalore, India
Vikrant Kaulgud , Accenture Technology Labs, Bangalore, India
Janardan Misra , Accenture Technology Labs, Bangalore, India
Shubhashis Sengupta , Accenture Technology Labs, Bangalore, India
Gary Titus , Accenture Technology Labs, Bangalore, India
Azmat Munshi , Accenture Technology Labs, Bangalore, India
ABSTRACT
Source code clustering is an important technique used in software development and maintenance to understand the modular structure of code. An array of algorithms are available for clustering like simulated annealing based search. Source code have different kinds of features such as structural or textual features. The collection of these different types of source code features and computation of relevant feature metrics is a difficult task. Further, the clustering algorithms can run on metrics based on different types of source code features or their combinations. This flexibility makes it non-trivial to test effectiveness of clustering algorithms on a source code. In this paper, we present a highly configurable clustering workbench that allows the user to collect the various source code features and then to select the code features used for clustering, the clustering algorithm and its various parameters. Clustering quality metrics are computed. They allow comparison of algorithm output based on different combinations of code-features and algorithms. We also present the specific contribution made in multi-dimensional feature analysis and clustering. The tool hides the algorithm complexity from the user, thus allowing complete focus on understanding the 'effect' of the configuration choices. We have also applied this tool in real-life maintenance projects, where the users found it useful to tweak the clustering techniques for the source-code peculiarities.
INDEX TERMS
Clustering algorithms, Feature extraction, Java, Algorithm design and analysis, Partitioning algorithms, Vectors, Measurement, Experimental Workbench, Clustering, Source Code Analysis, Semantic Indexing
CITATION
Annervaz K M, Vikrant Kaulgud, Janardan Misra, Shubhashis Sengupta, Gary Titus, Azmat Munshi, "Code clustering workbench", 2013 IEEE 13th International Working Conference on Source Code Analysis and Manipulation (SCAM), vol. 00, no. , pp. 31-36, 2013, doi:10.1109/SCAM.2013.6648181
81 ms
(Ver 3.3 (11022016))