This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Hierarchical Clustering for Software Architecture Recovery
November 2007 (vol. 33 no. 11)
pp. 759-780
Abstract-Gaining an architectural level understanding of a software system is important for many reasons. When the description of a system?s architecture does not exist, attempts must be made to recover it. In recent years, researchers have explored the use of clustering for recovering a software system?s architecture, given only its source code. The main contributions of this paper are as follows. First, we review hierarchical clustering research in the context of software architecture recovery and modularization. Second, to employ clustering meaningfully, it is necessary to understand the peculiarities of the software domain, and the behavior of clustering measures and algorithms in this domain. To this end, we provide a detailed analysis of the behavior of various similarity and distance measures that may be employed for software clustering. Thirdly, we analyze the clustering process of various well-known clustering algorithms using multiple criteria, and show how arbitrary decisions taken by these algorithms during clustering affect the quality of their results. Finally, we present an analysis of two recently proposed clustering algorithms, revealing close similarities in their apparently different clustering approaches. Experiments on four legacy software systems provide insight into the behavior of well-known clustering algorithms, and their characteristics in the software domain.

[1] R. Kazman, S.G. Woods, and S.J. Carriere, “Requirements for Integrating Software Architecture and Reengineering Models: Corum II,” Proc. Fifth Working Conf. Reverse Eng., pp. 154-163, 1998.
[2] A.V. Deursen, C. Hofmeister, R. Koschke, L. Moonen, and C. Riva, “Symphony: View-Driven Software Architecture Reconstruction,” Proc. Fourth Working IEEE/IFIP Conf. Software Architecture, pp. 122-132, 2004.
[3] R. Koschke, “Atomic Architectural Component Recovery for Program Understanding and Evolution,” PhD dissertation, Univ. of Stuttgart, 2000.
[4] C. Riva, “Reverse Architecting: An Industrial Experience Report,” Proc. Seventh Working Conf. Reverse Eng., pp. 42-51, 2000.
[5] M. Consens, A. Mendelzon, and A. Ryman, “Visualizing and Querying Software Structures,” Proc. 14th Int'l Conf. Software Eng., pp. 138-156, 1992.
[6] D.R. Harris, H.B. Reubenstein, and A.S. Yeh, “Reverse Engineering to the Architectural Level,” Proc. 17th Int'l Conf. Software Eng., pp. 186-195, 1995.
[7] H.M. Fahmy, R.C. Holt, and J.R. Cordy, “Wins and Losses of Algebraic Transformations of Software Architectures,” Proc. 16th Ann. Int'l Conf. Automated Software Eng., pp. 51-62, 2001.
[8] R. Kazman and S.J. Carrière, “View Extraction and View Fusion in Architectural Understanding,” Proc. Fifth Int'l Conf. Software Reuse, pp. 290-299, 1998.
[9] C. Lindig and G. Snelting, “Assessing Modular Structure of Legacy Code Based on Mathematical Concept Analysis,” Proc. 19th Int'l Conf. Software Eng., pp. 349-359, 1997.
[10] P. Tonella, “Concept Analysis for Module Restructuring,” IEEE Trans. Software Eng., vol. 27, no. 4, pp. 351-363, Apr. 2001.
[11] C. Montes De Oca and D.L. Carver, “Identification of Data Cohesive Subsystems Using Data Mining Techniques,” Proc. Int'l Conf. Software Maintenance, pp. 16-23, 1998.
[12] C. Montes De Oca and D.L. Carver, “A Visual Representation Model for Software Subsystem Decomposition,” Proc. Fifth Working Conf. Reverse Eng., pp. 231-240, 1998.
[13] K. Sartipi, K. Kontogiannis, and F. Mavaddat, “Design Recovery Using Data Mining Techniques,” Proc. Fourth European Conf. Software Maintenance and Reeng., pp. 129-140, 2000.
[14] C. Tjortjis, L. Sinos, and P. Layzell, “Facilitating Program Comprehension by Mining Association Rules from Source Code,” Proc. 11th Int'l Workshop Program Comprehension, pp. 125-133, 2003.
[15] T.A. Wiggerts, “Using Clustering Algorithms in Legacy Systems Remodularization,” Proc. Fourth Working Conf. Reverse Eng., pp. 33-43, 1997.
[16] N. Anquetil and T.C. Lethbridge, “Experiments with Clustering as a Software Remodularization Method,” Proc. Sixth Working Conf. Reverse Eng., pp. 235-255, 1999.
[17] J. Davey and E. Burd, “Evaluating the Suitability of Data Clustering for Software Remodularization,” Proc. Seventh Working Conf. Reverse Eng., pp. 268-277, 2000.
[18] K. Sartipi and K. Kontogiannis, “A User-Assisted Approach to Component Clustering,” J. Software Maintenance and Evolution: Research and Practice, vol. 15, no. 4, pp. 265-295, July-Aug. 2003.
[19] M. Saeed, O. Maqbool, H.A. Babri, S.M. Sarwar, and S.Z. Hassan, “Software Clustering Techniques and the Use of the Combined Algorithm,” Proc. Seventh European Conf. Software Maintenance and Reeng., pp. 301-306, 2003.
[20] O. Maqbool and H.A. Babri, “The Weighted Combined Algorithm: A Linkage Algorithm for Software Clustering,” Proc. Eighth European Conf. Software Maintenance and Reeng., pp. 15-24, 2004.
[21] P. Andritsos and V. Tzerpos, “Information-Theoretic Software Clustering,” IEEE Trans. Software Eng., vol. 31, no. 2, pp. 150-165, Feb. 2005.
[22] B.S. Mitchell and S. Mancoridis, “On the Automatic Modularization of Software Systems Using the Bunch Tool,” IEEE Trans. Software Eng., vol. 32, no. 3, pp. 193-208, Mar. 2006.
[23] M. Krone and G. Snelting, “On the Inference of Configuration Structures from Source Code,” Proc. 16th Int'l Conf. Software Eng., pp. 49-57, 1994.
[24] G. Snelting, “Reengineering of Configurations Based on Mathematical Concept Analysis,” ACM Trans. Software Eng. and Methodology, vol. 5, no. 2, pp. 146-189, Apr. 1996.
[25] T. Eisenbarth, R. Koschke, and D. Simon, “Derivation of Feature Component Maps by Means of Concept Analysis,” Proc. Fifth European Conf. Software Maintenance and Reeng., pp. 176-179, 2001.
[26] M. Siff and T. Reps, “Identifying Modules via Concept Analysis,” IEEE Trans. Software Eng., vol. 25, no. 6, pp. 749-768, Nov.-Dec. 1999.
[27] P. Tonella and G. Antoniol, “Inference of Object-Oriented Design Patterns,” J. Software Maintenance: Research and Practice, vol. 13, no. 5, pp. 309-330, Sept.-Oct. 2000.
[28] P. Tonella, “Using a Concept Lattice of Decomposition Slices for Program Understanding and Impact Analysis,” IEEE Trans. Software Eng., vol. 29, no. 6, pp. 495-509, June 2003.
[29] I. Ivkovic and K. Kontogiannis, “Using Formal Concept Analysis to Establish Model Dependencies,” Proc. Int'l Conf. Information Technology: Coding and Computing, pp. 365-372, 2005.
[30] A. Michail, “Data Mining Library Reuse Patterns Using Generalized Association Rules,” Proc. 22nd Int'l Conf. Software Eng., pp.167-176, 2000.
[31] J.S. Shirabad, T.C. Lethbridge, and S. Matwin, “Mining the Software Change Repository of a Legacy Telephony System,” Proc. Int'l Workshop Mining Software Repositories, pp. 53-57, 2004.
[32] A.T.T. Ying, G.C. Murphy, R. Ng, and M.C. Chu Carroll, “Predicting Source Code Changes by Mining Change History,” IEEE Trans. Software Eng., vol. 30, no. 9, pp. 574-586, Sept. 2004.
[33] T. Zimmermann, P. Weibgerber, S. Diehl, and A. Zeller, “Mining Version Histories to Guide Software Changes,” IEEE Trans. Software Eng., vol. 31, no. 6, pp. 429-445, June 2005.
[34] S.A.E. Hafiz, “Identifying Objects in Procedural Programs Using Clustering Neural Networks,” Automated Software Eng. J., vol. 7, no. 3, pp. 239-261, July 2000.
[35] D. Rousidis and C. Tjortjis, “Clustering Data Retrieved From Java Source Code to Support Software Maintenance: A Case Study,” Proc. Ninth European Conf. Software Maintenance and Reeng., pp.276-279, 2005.
[36] G.A.D. Lucca, A.R. Fasolini, and P. Tramontana, “Reverse Engineering Web Applications: The Ware Approach,” J. Software Maintenance and Evolution: Research and Practice, vol. 16, nos. 1-2, pp.71-101, Jan.-Apr. 2004.
[37] D. Pollet, S. Ducasse, L. Poyet, I. Alloui, S. Cimpan, and H. Verjus, “Towards a Process-Oriented Software Architecture Reconstruction Taxonomy,” Proc. 11th European Conf. Software Maintenance and Reeng., pp. 137-148, 2007.
[38] H.A. Müller, K. Wong, and S.R. Tilley, “Understanding Software Systems Using Reverse Engineering Technology,” Proc. 62nd Congress of L'Association Canadienne Francaise pour l'Avancement des Sciences Proc., pp. 41-48, 1994.
[39] A.K. Jain, M.N. Murty, and P.J. Flynn, “Data Clustering: A Review,” ACM Computing Surveys, vol. 13, no. 3, pp. 264-323, Sept. 1999.
[40] D. Crawford, “Top 10 Downloads from ACM's Digital Library,” Comm. ACM, vol. 50, no. 2, pp. 99-100, Feb. 2007.
[41] D. Crawford, “Top 10 Downloads from ACM's Digital Library,” Comm. ACM, vol. 49, no. 5, pp. 15-16, May 2006.
[42] B.S. Everitt and S. Landau, Cluster Analysis, fourth ed. Arnold Publishers, 2001.
[43] A. Shokoufandeh, S. Mancoridis, T. Denton, and M. Maycock, “Spectral and Meta-Heuristic Algorithms for Software Clustering,” J. Systems and Software, vol. 77, no. 3, pp. 213-223, 2004.
[44] D.H. Hutchens and V.R. Basili, “System Structure Analysis: Clustering with Data Bindings,” IEEE Trans. Software Eng., vol. 11, no. 8, pp. 749-757, Aug. 1985.
[45] M. Shtern and V. Tzerpos, “A Framework for the Comparison of Nested Software Decompositions,” Proc. 11th IEEE Working Conf. Reverse Eng., pp. 284-292, 2004.
[46] R. Koschke and D. Simon, “Hierarchical Reflection Models,” Proc. 10th Working Conf. Reverse Eng., pp. 36-45, 2003.
[47] V. Tzerpos and R.C. Holt, “Software Botryology: Automatic Clustering of Software Systems,” Proc. Ninth Int'l Workshop Database and Expert Systems Applications, pp. 811-819, 1998.
[48] D. Bojic and D. Velasevic, “A Use-Case Driven Method of Architecture Recovery for Program Understanding and Reuse Reengineering,” Proc. Fourth European Software Maintenance and Reeng., pp. 23-32, 2000.
[49] C. Xio and V. Tzerpos, “Software Clustering Based on Dynamic Dependencies,” Proc. Ninth European Conf. Software Maintenance and Reeng., pp. 124-133, 2005.
[50] R.W. Schwanke and M.A. Platoff, “Cross References Are Features,” Proc. Int'l Conf. Software Configuration Management, pp.86-95, 1989.
[51] R. Schwanke, “An Intelligent Tool for Reengineering Software Modularity,” Proc. 13th Int'l Conf. Software Eng., pp. 83-92, 1991.
[52] S. Mancoridis, B. Mitchell, Y. Chen, and E. Gansner, “Bunch: A Clustering Tool for the Recovery and Maintenance of Software System Structures,” Proc. Int'l Conf. Software Maintenance, pp. 50-62, 1999.
[53] H.A. Müller and J.S. Uhl, “Composing Subsystem Structures using (k, 2)-Partite Graphs,” Proc. Conf. Software Maintenance, pp.12-19, 1990.
[54] Z. Wen and V. Tzerpos, “Software Clustering Based on Omnipresent Object Detection,” Proc. 13th IEEE Int'l Workshop Program Comprehension, pp. 269-278, 2005.
[55] V. Tzerpos, “Comprehension-Driven Software Clustering,” PhD dissertation, Univ. of Toronto, 2001.
[56] P. Tonella, F. Ricca, E. Pianta, and C. Girardi, “Using Keyword Extraction for Web Site Clustering,” Proc. Fifth Int'l Workshop Web Site Evolution, pp. 41-48, 2003.
[57] A. Kuhn, S. Ducasse, and T. Girba, “Enriching Reverse Engineering with Semantic Clustering,” Proc. 12th Working Conf. Reverse Eng., pp. 133-142, 2005.
[58] O. Maqbool and H.A. Babri, “Automated Software Clustering: An Insight Using Cluster Labels,” J. Systems and Software, vol. 79, no. 11, pp. 1632-1648, 2006.
[59] O. Maqbool and H.A. Babri, “Interpreting Clustering Results through Cluster Labeling,” Proc. IEEE Int'l Conf. Emerging Technologies, pp. 429-434, 2005.
[60] A. Lakhotia and J.M. Gravley, “Toward Experimental Evaluation of Sub-System Classification Recovery Techniques,” Proc. Second Working Conf. Reverse Eng. (WCRE '95), pp. 262-269, 1995.
[61] Z. Wen and V. Tzerpos, “An Effectiveness Measure for Software Clustering Algorithms,” Proc. 12th Int'l Workshop Program Comprehension, pp. 194-203, 2004.
[62] B.S. Mitchell, “A Heuristic Search Approach to Solving the Software Clustering Problem,” PhD dissertation, Drexel Univ., 2002.
[63] R. Koschke and T. Eisenbarth, “A Framework for Experimental Evaluation of Clustering Techniques,” Proc. Eighth Int'l Workshop Program Comprehension, pp. 201-210, 2000.
[64] J. Wu, A.E. Hassan, and R.C. Holt, “Comparison of Clustering Algorithms in the Context of Software Evolution,” Proc. Int'l Conf. Software Maintenance, pp. 525-535, 2005.
[65] O. Maqbool, “Architecture Recovery of Software Legacy Systems Using Unsupervised Machine Learning Techniques,” PhD dissertation, Lahore Univ. of Management Sciences, 2006.
[66] P. Tonella, F. Ricca, E. Pianta, C. Girardi, G.D. Lucca, A.R. Fasolini, P. Tramontana, and M. Ferrara, “Evaluation Methods for Web Application Clustering,” Proc. Fifth Int'l Workshop Web Site Evolution, pp. 33-40, 2003.
[67] Y.S. Maarek and G.E. Kaiser, “Change Management for Very Large Software Systems,” Proc. Seventh Ann. Int'l Phoenix Conf. Computers and Comm., pp. 280-285, 1988.
[68] A. Choi and W. Scacchi, “Extracting and Restructuring the Design of Large Systems,” IEEE Software, vol. 7, no. 1, pp. 13-17, Jan. 1990.
[69] J. Girard, R. Koschke, and G. Schied, “A Metric-Based Approach to Detect Abstract Data Types and State Encapsulations,” J.Automated Software Eng., vol. 6, no. 4, pp. 357-386, 1999.
[70] Y.S. Maarek, D.M. Berry, and G.E. Kaiser, “An Information Retrieval Approach for Automatically Constructing Software Libraries,” IEEE Trans. Software Eng., vol. 17, no. 8, pp. 800-813, Aug. 1991.
[71] H.A. Müller, S.R. Tilley, M.A. Orgun, B.D. Corrie, and J.S. Uhl, “A Reverse Engineering Environment Based on Spatial and Visual Interconnection Models,” Proc. Fifth ACM Symp. Software Development Environments, pp. 88-98, 1992.
[72] H.A. Müller, S.R. Tilley, M. Orgun, and J.S. Uhl, “A Reverse Engineering Approach to Subsystem Structure Identification,” Software Maintenance: Research and Practice, vol. 5, no. 4, pp. 181-204, Dec. 1993.
[73] A. Müller, K. Wong, and S.R. Tilley, “Understanding Software Systems Using Reverse Engineering Technology,” Proc. 62nd Congress of L'Association Canadienne Francaise pour l'Avancement des Sciences Proc., 1994.
[74] B. Andreopoulos, A. An, V. Tzerpos, and X. Wang, “Multiple Layer Clustering of Large Software Systems,” Proc. 12th Working Conf. Reverse Eng., pp. 79-88, 2005.
[75] A.V. Deursen and T. Kuipers, “Identifying Objects Using Cluster and Concept Analysis,” Proc. 21st Int'l Conf. Software Eng., pp. 246-255, 1999.
[76] R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification, second ed. John Wiley & Sons, 2000.
[77] P. Legendre and L. Legendre, Numerical Ecology, second English ed. Elsevier, 1998.
[78] T.M. Mitchell, Machine Learning. McGraw Hill, 1997.
[79] A. Christl, R. Koschke, and M. Storey, “Equipping the Reflection Method with Automated Clustering,” Proc. 12th Working Conf. Reverse Eng., pp. 89-98, 2005.

Index Terms:
Software Engineering, Restructuring, reverse engineering, and reengineering, architecture recovery, hierarchical clustering , arbitrary decisions
Citation:
Onaiza Maqbool, Haroon Babri, "Hierarchical Clustering for Software Architecture Recovery," IEEE Transactions on Software Engineering, vol. 33, no. 11, pp. 759-780, Nov. 2007, doi:10.1109/TSE.2007.70732
Usage of this product signifies your acceptance of the Terms of Use.