2013 20th Working Conference on Reverse Engineering (WCRE) (2012)
Kingston, ON, Canada Canada
Oct. 15, 2012 to Oct. 18, 2012
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/WCRE.2012.21
Software clustering is an important technique for extracting high level component architecture from the underlying source code. One of the limitations of the existing approaches is that most of the proposed techniques use only similar types of features for estimating distance between source code elements. Therefore, in cases, where the selected features are poorly present in the source code, these techniques may not produce good quality results in absence of adequate inputs to work on. In this paper we propose an approach to overcome this limitation. Proposed approach uses a combination of multiple types of features together and applies automated weighing on the extracted features to enhance their information quality and to reduce noise. We define a way to estimate distance between code elements in terms of combination of multiple types of features. Weighted graph partitioning with a multi-objective global modularity criterion is used to select the clusters as architectural components. We describe methods for automated labeling of the extracted components and for generating inter-component interactions. We further discuss how the suggested approach extends to clustering at multiple hierarchical levels, to application portfolios, and even for improving precision for the feature location problem.
Reverse engineering, latent semantic indexing, software clustering, architectural recovery, component discovery, program comprehension, lexical analysis, vector space model
Janardan Misra, K.M. Annervaz, Vikrant Kaulgud, Shubhashis Sengupta, Gary Titus, "Software Clustering: Unifying Syntactic and Semantic Features", 2013 20th Working Conference on Reverse Engineering (WCRE), vol. 00, no. , pp. 113-122, 2012, doi:10.1109/WCRE.2012.21