This Article 
 Bibliographic References 
 Add to: 
Uniform Techniques for Deriving Similarities of Objects and Subschemes in Heterogeneous Databases
March/April 2003 (vol. 15 no. 2)
pp. 271-294

Abstract—The availability of automatic tools for inferring semantics of database schemes is useful to solve several database design problems such as, that of obtaining Cooperative Information Systems or Data Warehouses from large sets of data sources. In this context, a main problem is to single out similarities or dissimilarities among scheme objects (interscheme properties). This paper presents graph-based techniques for a uniform derivation of interscheme properties including synonymies, homonymies, type conflicts, and subscheme similarities. These techniques are characterized by a common core: the computation of maximum weight matchings on some bipartite weighted graphs derived using a suitable metrics to measure semantic closeness of objects. The techniques have been implemented in a system prototype. Several experiments conducted with it, and (in part) accounted for in the paper, confirmed the effectiveness of our approach.

[1] Y. Arens, C.A. Knoblock, C.Y. Chee, and C. Hsu, “Retrieving and Integrating Data From Multiple Information Sources,” Proc. Int'l J. Cooperative Information Systems, vol. 2, no. 2, pp. 127-158, 1993.
[2] C. Batini and M. Lenzerini, “A Methodology for Data Schema Integration in the Entity Relationship Model,” IEEE Trans. Software Eng., vol. 10, no. 6, pp. 650-664, 1984.
[3] C. Batini, M. Lenzerini, and S.B. Navathe, “A Comparative Analysis of Methodologies for Database Schema Integration,” ACM Computing Surveys, vol. 18, no. 2, pp. 323-364, Dec. 1986.
[4] S. Bergamaschi et al., "Semantic Integration of Heterogeneous Information Sources," J. Data and Knowledge Eng., vol. 36, no. 3, March 2001, pp. 215-249.
[5] M.W. Bright, A.R. Hurson, and S. Pakzad, “Automated Resolution of Semantic Heterogeneity in Multidatabases,” ACM Trans. Database Systems, vol. 19, no. 2, pp. 212-253, June 1994.
[6] S. Castano and V. De Antonellis, “Semantic Dictionary Design for Database Interoperability,” Proc. Int'l Conf. Data Eng. '97, pp. 43-54, 1997.
[7] T. Catarci and M. Lenzerini, “Representing and Using Interschema Knowledge in Cooperative Information Systems,” J. Intelligent and Cooperative Information Systems, vol. 2, no. 4, pp. 375-398, 1993.
[8] P.P. Chen, “The Entity‐Relationship Model: Toward a Unified View of Data,” ACM Trans. Database Systems, Vol. 1, No. 1, Jan. 1976, pp. 9–36.
[9] M.M. Dalkilic and E.L. Robertson, “Information Dependencies,” Proc. Symp. Principles of Database Systems '00, ACM Press, pp. 245-253, 2000.
[10] P. Fankhauser, M. Kracker, and E.J. Neuhold, “Semantic vs. Structural Resemblance of Classes,” ACM SIGMOD RECORD, vol. 20, no. 4, pp. 59-63, 1991.
[11] Z. Galil, “Efficient Algorithms for Finding Maximum Matching in Graphs,” ACM Computing Surveys, vol. 18, pp. 23-38, 1986.
[12] W. Gotthard, P.C. Lockemann, and A. Neufeld, “System‐Guided View Integration for Object‐Oriented Databases,” IEEE Trans. Knowledge and Data Engineering, Vol. 4, No. 1, Feb. 1992, pp. 1–22.
[13] R.V. Guha, Contexts: A Formalization and Some Applications. PhD thesis, Stanford Univ., 1991.
[14] J. Hammer and D. McLeod, “An Approach to Resolving Semantic Heterogenity in a Federation of Autonomous, Heterogeneous Database Systems,” J. Intelligent and Cooperative Information Systems, vol. 2, no. 1, pp. 51-83, 1993.
[15] R. Bayardo et. al., InfoSleuth: Agent-Based Semantic Integration of Information in Open and Dynamic Environments Proc. ACM SIGMOD Int'l Conf. Management of Data, 1997.
[16] V. Kashyap and A.P. Sheth, “Semantic and Schematic Similarities Between Database Objects: A Context-Based Approach,” Very Large Data Base J., vol. 5, no. 4, pp. 276-304, 1996.
[17] J.A. Larson, S.B. Navathe, and R. El‐Masri, “A Theory of Attribute Equivalence in Databases with Applications to Schema Integration,” IEEE Trans. Software Engineering, Vol. 15, No. 4, Apr. 1989, pp. 449–463.
[18] E. Mena, et al., "Observer: An Approach for Query Processing in Global Information Systems Based on Interoperation Across Pre-Existing Ontologies," Distributed and Parallel Databases, Jan. 2000, vol. 8, no. 2, pp. 223-271.
[19] A.M. Ouksel and C.F. Naiman, “Coordinating Context Building in Heterogeneous Information Systems,” J. Intelligent Information Systems, vol. 3, no. 2, pp. 151-183, 1994.
[20] L. Palopoli et al., "Intensional and Extensional Integration and Abstraction of Heterogeneous Databases," J. of Data and Knowledge Engineering, vol. 35, no. 3, Dec. 2000, pp. 201-237.
[21] L. Palopoli, D. Saccà, and D. Ursino, “An Automatic Technique for Detecting Type Conflicts in Database Schemes,” Proc. ACM Conf. Information and Knowledge Management '98, ACM Press, pp. 306-313, 1998.
[22] L. Palopoli, D. Saccà, and D. Ursino, “$\big. {\rm{DL}}_P\bigr.$: A Description Logic for Extracting and Managing Complex Terminological and Structural Properties from Database Schemes,” Information Systems, vol. 24, no. 5, pp. 403-425, 1999.
[23] L. Palopoli, D. Saccà, and D. Ursino, “Semi-Automatic Techniques for Deriving Interscheme Properties from Database Schemes,” Data and Knowledge Eng., vol. 30, no. 4, pp. 239-273, 1999.
[24] S.D. Richardson, W.B. Dolan, and L. Vanderwende, “Mindnet: Acquiring and Structuring Semantic Information from Text,” Proc. Int'l Conf. Computational Linguistics (COLING-ACL '98), Morgan Kaufmann, pp. 1098-1102, 1998.
[25] E. Sciore, M. Siegel, and A. Rosenthal, “Using Semantic Values to Facilitate Interoperability Among Heterogeneous Information Systems,” ACM Trans. Database Systems, vol. 19, no. 2, pp. 254-290, 1994.
[26] S. Spaccapietra and C. Parent, “View Integration: A Step Forward in Solving Structural Conflicts,” IEEE Trans. Knowledge and Data Eng., vol. 6, no. 2, pp. 258-274, 1994.
[27] G. Terracina and D. Ursino, “A Study on the Interaction Between Interscheme Property Extraction and Type Conflict Resolution,” Proc. Int'l Database Eng. and Applications Symp. '00, pp. 25-36, 2000.

Index Terms:
Synonymies, homonymies, type conflicts, subscheme similarities, derivation of database semantics, heterogeneous databases, database interoperability.
Luigi Palopoli, Domenico Saccà, Giorgio Terracina, Domenico Ursino, "Uniform Techniques for Deriving Similarities of Objects and Subschemes in Heterogeneous Databases," IEEE Transactions on Knowledge and Data Engineering, vol. 15, no. 2, pp. 271-294, March-April 2003, doi:10.1109/TKDE.2003.1185834
Usage of this product signifies your acceptance of the Terms of Use.