Subscribe
Issue No.02 - March/April (2010 vol.36)
pp: 248-274
Doug Kimelman , IBM Thomas J. Watson Research Center, Yorktown Heights
Marsha Kimelman , Independent Consultant
David Mandelin , Mozilla Corporation, Mountain View
Daniel M. Yellin , IBM Israel Software Lab, Jerusalem
ABSTRACT
IT system architectures and many other kinds of structured artifacts are often described by formal models or informal diagrams. In practice, there are often a number of versions of a model or diagram, such as a series of revisions, divergent variants, or multiple views of a system. Understanding how versions correspond or differ is crucial, and thus, automated assistance for matching models and diagrams is essential. We have designed a framework for finding these correspondences automatically based on Bayesian methods. We represent models and diagrams as graphs whose nodes have attributes such as name, type, connections to other nodes, and containment relations, and we have developed probabilistic models for rating the quality of candidate correspondences based on various features of the nodes in the graphs. Given the probabilistic models, we can find high-quality correspondences using search algorithms. Preliminary experiments focusing on architectural models suggest that the technique is promising.
INDEX TERMS
Bayesian techniques, IT system architecture, modeling tools, change control.
CITATION
Doug Kimelman, Marsha Kimelman, David Mandelin, Daniel M. Yellin, "Bayesian Approaches to Matching Architectural Diagrams", IEEE Transactions on Software Engineering, vol.36, no. 2, pp. 248-274, March/April 2010, doi:10.1109/TSE.2009.56
REFERENCES
 [1] S. Abrams, B. Bloom, P. Keyser, D. Kimelman, E. Nelson, W. Neuberger, T. Roth, I. Simmonds, S. Tang, and J. Vlissides, "Architectural Thinking and Modeling with AWB: The Architects Workbench," IBM Systems J., vol. 45, no. 3, pp. 481-500, 2006. [2] I. Alexander, "Towards Automatic Traceability in Industrial Practice," Proc. First Int'l Workshop Traceability in Conjunction with the 17th IEEE Int'l Conf. Automated Software Eng., pp. 26-31, Sept. 2002. [3] G. Antoniol, G. Canfora, G. Casazza, and A.D. Lucia, "Maintaining Traceability Links during Object-Oriented Software Evolution," Software-Practice and Experience, vol. 31, pp. 331-355, 2001. [4] S. Bergamaschi, S. Castano, and M. Vincini, "Semantic Integration of Semistructured and Structured Data Sources," SIGMOD Record, vol. 28, no. 1, pp. 54-59, 1999. [5] P.A. Bernstein, "Applying Model Management to Classical Meta Data Problems," Proc. First Biennial Conf. Innovative Data Systems Research, pp. 209-220, 2003. [6] A. Cicchetti, D.D. Ruscio, and A. Pierantonio, "A Metamodel Independent Approach to Difference Representation," J. Object Technology, vol. 6, no. 9, pp. 165-185, 2007. [7] R. Dechter and J. Pearl, "Generalized Best-First Search Strategies and the Optimality of A$\ast$ ," J. ACM, vol. 32, no. 3, pp. 505-536, July 1985. [8] A. Doan, P. Domingos, and A. Halevy, "Reconciling Schemas of Disparate Data Sources: A Machine-Learning Approach," Proc. 2001 ACM SIGMOD, pp. 509-520, 2001. [9] M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman and Company, Nov. 1979. [10] J. Garland and R. Anthony, Large-Scale Software Architecture. John Wiley and Sons, 2003. [11] A. Halevy, "Why Your Data Won't Mix," ACM Queue, vol. 3, no. 8, pp. 50-58, Oct. 2004. [12] Handbook on Architectures of Information Systems, pp. 669-692.  Springer, 2006. [13] IBM Insurance Application Architecture—Executive Summary, http://www-03.ibm.com/industries/insurance/ us/detail/ solutionP669447B27619A15.html , 2009. [14] E.T. Jaynes, Probability Theory: The Logic of Science. Cambridge Univ. Press, 2003. [15] A. Jossic, M.D.D. Fabro, J.-P. Lerat, J. Bezivin, and F. Jouault, "Model Integration with Model Weaving: A Case Study in System Architecture," Proc. Int'l Conf. Systems Eng. and Modeling, pp. 79-84, 2007. [16] P. Kruchten, "Architectural Blueprints—The $4+1$ View Model of Software Architecture," IEEE Software, vol. 12, no. 6, pp. 42-50, Nov. 1995. [17] Y. Lin, J. Gray, and F. Jouault, "DSMDiff: A Differentiation Tool for Domain-Specific Models," European J. Information Systems, vol. 16, pp. 349-361, Aug. 2007. [18] J. Madhavan, P.A. Bernstein, and E. Rahm, "Generic Schema Matching with Cupid," Proc. Int'l Conf. Very Large Databases, pp. 49-58, 2001. [19] D. Mandelin, D. Kimelman, and D.M. Yellin, "A Bayesian Approach to Diagram Matching with Application to Architectural Models," Proc. 28th Int'l Conf. Software Eng., May 2006. [20] S. Melnik, H. Garcia-Molina, and E. Rahm, "Similarity Flooding: A Versatile Graph Matching Algorithm and Its Application to Schema Matching," Proc. 18th Int'l Conf. Data Eng., 2002. [21] P.G. Moore, "The Estimation of the Mean of a Censored Normal Distribution by Ordered Variables," Biometrika, vol. 43, nos. 3/4, pp. 482-485, Dec. 1956. [22] NGOSS Shared Information/Data Model, http://en.wikipedia. org/wiki/NGOSS_Shared_Information Data_Model, 2009. [23] D. Ohst, M. Welle, and U. Kelter, "Difference Tools for Analysis and Design Documents," Proc. Int'l Conf. Software Maintenance, 2003. [24] E. Rahm and P.A. Bernstein, "A Survey of Approaches to Automatic Schema Matching," Very Large Databases J., vol. 10, no. 4, pp. 334-350, 2001. [25] B. Ramesh and M. Jarke, "Towards a Reference Model for Requirements Traceability," IEEE Trans. Software Eng., vol. 27, no. 1, pp. 58-93, Jan. 2001. [26] J. Rilling, P. Charland, and R. Witte, "Traceability in Software Engineering: Past, Present and Future," Technical Report TR-74-211, IBM, Oct. 2007. [27] N. Rozanski and E. Woods, Software Systems Architecture: Working with Stakeholders Using Viewpoints and Perspectives. Addison-Wesley, 2005. [28] S. Sherba, K. Anderson, and M. Faisal, "A Framework for Mapping Traceability Relationships," Proc. Second Int'l Workshop Traceability in Emerging Forms of Software Eng., pp. 32-39, Oct. 2003. [29] G. Spanoudakis and H. Kim, "Supporting the Reconciliation of Models of Object Behaviour," Software and System Modeling, vol. 3, no. 4, pp. 273-293, 2004. [30] D.E. Tarjan, Data Structures and Network Algorithms. SIAM, Nov. 1983. [31] TM Forum—Information Framework (SID). http://www. tmforum.org/InformationFramework/ 1684home.html, 2009. [32] C. Treude, S. Berlik, S. Wenzel, and U. Kelter, "Difference Computation of Large Models," Proc. Sixth Joint Meeting of the European Software Eng. Conf. and the ACM SIGSOFT Symp. Foundations of Software Eng., pp. 295-304, 2007. [33] K. Tu and Y. Yu, "CMC: Combining Multiple Schema-Matching Strategies Based on Credibility Prediction," Proc. Int'l Conf. Database Systems for Advanced Applications, pp. 888-893, 2005. [34] WebSVN—Diplomarbeit, http://surprise.wh-stuttgart.de/websvnlog.php?repname=diplomarbeit**path=%2Ftrunk% 2Fdesign%2FMiddlewareUML.xmi**rev=87**sc=1**isdir=0 , 2009. [35] Z. Xing and E. Stroulia, "Umldiff: An Algorithm for Object-Oriented Design Differencing," Proc. 20th IEEE/ACM Int'l Conf. Automated Software Eng., pp. 54-65, 2005. [36] W. Yih, J. Goodman, and G. Hulten, "Learning at Low False Positive Rates," Proc. Third Conf. Email and Anti-Spam, July 2006. [37] R. Youngs, D. Redmond-Pyle, P. Spaas, and E. Kahan, "A Standard for Architecture Description," IBM Systems J., vol. 38, no. 1, pp. 32-50, 1999.