This Article 
 Bibliographic References 
 Add to: 
Global Viewing of Heterogeneous Data Sources
March/April 2001 (vol. 13 no. 2)
pp. 277-297

Abstract—The problem of defining global views of heterogeneous data sources to support querying and cooperation activities is becoming more and more important due to the availability of multiple data sources within complex organizations and in global information systems. Global views are defined to provide a unified representation of the information in the different sources by analyzing conceptual schemas associated with them and resolving possible semantic heterogeneity. In the paper, we propose an affinity-based unification method for global view construction. In the method: 1) The concept of affinity is introduced to assess the level of semantic relationship between elements in different schemas by taking into account semantic heterogeneity; 2) Schema elements are classified by affinity levels using clustering procedures so that their different representations can be analyzed for unification; 3) Global views are constructed starting from selected clusters by unifying representations of their elements. Experiences of applying the proposed unification method and the associated tool environment artemis on databases of the Italian Public Administration information systems are described.

[1] S. Agarval, A.M. Keller, G. Wiederhold, and K. Saraswat, “Flexible Relation: An Approach for Integrating Data from Multiple, Possibly Inconsistent Databases,” Proc. ICDE'95 11th IEEE Conf. Data Eng. pp. 495-504, Feb. 1995.
[2] Y. Arens, C.Y. Chee, C.N. Hsu, and C.A. Knoblock, “Retrieving and Integrating Data from Multiple Information Sources,” Int'l J. Intelligent and Cooperative Information Systems, vol. 2, no. 2, pp. 127-158, 1993.
[3] P. Atzeni and V. De Antonellis, Relational Database Theory. Redwood City, Calif.: Benjamin/Cummings, 1993.
[4] C. Batini, G. Longobardi, and S. Fornasiero, “An Experience of Integration of Conceptual Schemas in the Italian Public Administration,” Proc. ER'97, 16th Int'l Conf. Conceptual Modeling, pp. 313-321, Nov. 1997.
[5] S. Ceri, S.B. Navathe, and C. Batini, Conceptual Database Design, An Entity-Relationship Approach, Benjamin/Cummings, 1992.
[6] C. Batini, M. Lenzerini, and S.B. Navathe, “A Comparative Analysis of Methodologies for Database Schema Integration,” ACM Computing Surveys, vol. 18, no. 2, pp. 323-364, Dec. 1986.
[7] S. Bergamaschi, S. Castano, S. De Capitani di Vimercati, S. Montanari, and M. Vincini, “An Intelligent Approach to Information Integration,” Proc. Int'l Conf. Formal Ontology in Information Systems, June 1998.
[8] J. M. Blanco, A. Illarramendi, and A. Goni, “Building a Federated Relational Database System: An Approach Using a Knowledge-Based System,” Int'l J. Intelligent and Cooperative Information Systems, vol. 3, no. 4, pp. 415-455, 1994.
[9] M.W. Bright, A.R. Hurson, and S. Pakzad, “Automated Resolution of Semantic Heterogeneity in Multidatabases,” ACM Trans. Database Systems, vol. 19, no. 2, pp. 212-253, June 1994.
[10] P. Buneman, L. Raschid, and J. Ullman, “Mediator Languages—A Proposal for a Standard,” Report of an$I^3$POB working group held at the Univ. of Maryland, Apr. 1996. harnad90.sgproblem.html pub/
[11] S. Castano and V. De Antonellis, M.G. Fugini, and B. Pernici, “Conceptual Schema Analysis: Techniques and Applications,” ACM Trans. Database Systems, vol. 23, no. 3, pp. 286–333, 1998.
[12] S. Castano and V. De Antonellis, “Semantic Dictionary Design for Database Interoperability,” Proc. Int'l Conf. Data Eng. '97, pp. 43-54, 1997.
[13] S. Castano and V. DeAntonellis, “A Discovery-Based Approach to Database Ontology Design,” Distributed and Parallel Databases—Special Issue on Ontologies and Databases, vol. 7, no. 1, pp. 67-98, 1999.
[14] The Object Database Standard: ODMG-93, R.R. Cattell, Morgan Kaufmann 1996.
[15] A.L.P. Chen, P.S.M. Tsai, and J.-L. Koh, “Identifying Object Isomerism in Multidatabase Systems,” Distributed and Parallel Databases, vol. 4, no. 2, pp. 143-168, Apr. 1996.
[16] C. Clifton, E. Housman, and A. Rosenthal, “Experience with a Combined Approach to Attribute-Matching Across Heterogeneous Databases,” Proc. IFIP DS-7 Data Semantics Conf., 1997.
[17] Cluster Analysis. B. Everitt, Heinemann Educational Books Ltd., Social Science Research Council, 1974.
[18] Associative Networks. N.V. Findler Academic Press, 1979.
[19] H. Garcia-Molina et al., The TSIMMIS Approach to Mediation: Data Models and Languages J. Intelligent Information Systems, vol. 8, no. 2, 1997.
[20] M.R. Genesereth, A. Keller, and O.M. Duschka, Infomaster: An Information Integration System Proc. ACM SIGMOD Int'l Conf. Management of Data, 1997.
[21] J. Gilarranz, J. Gonzalo, and F. Verdejo, “Using the EuroWordNet Multilingual Semantic Database,” Proc. Spring Symp. Cross-Language Text and Speech Retrieval, 1996.
[22] R. Hull, “Managing Semantic Heterogeneity in Databases: A Theoretical Perspective,” tutorial presented to, Proc. Data Systems, 1997.
[23] W. Kim, I. Choi, S. Gala, and M. Scheevel, “On Resolving Schematic Heterogeneity in Multidatabase Systems,” Modern Database Systems—the Object Model, Interoperability and Beyond. W. Kim, ACM Press, pp. 521-550, 1995.
[24] J.A. Larson, S.B. Navathe, and R. El‐Masri, “A Theory of Attribute Equivalence in Databases with Applications to Schema Integration,” IEEE Trans. Software Engineering, Vol. 15, No. 4, Apr. 1989, pp. 449–463.
[25] A.Y. Levy, D. Srivastava, and T. Kirk, “Data Model and Query Evaluation in Global Information Systems,” J. Intelligent Information Systems, special issue on networked information discovery and retrieval, vol. 5, no. 2, 1995.
[26] A.Y. Levy, A. Rajaraman, and J.J. Ordille, “Querying Heterogeneous Information Sources Using Source Descriptions,” Proc. 22nd VLDB Conf. (VLDB-96), 1996.
[27] W.-S. Li and C. Clifton, “Semantic Integration in Heterogeneous Databases Using Neural Networks,” Proc. 20th Int'l Conf. Very Large Data Bases, pp. 1-12, Sept. 1994.
[28] S. Milliner, A. Bonguettaya, and M.P. Papazoglou, “A Scalable Architecture for Autonomous Heterogeneous Database Interactions,” Proc. 21st Int'l Conf. Very Large Databases, pp. 515-526, Sept. 1995.
[29] G.A. Miller, "WordNet: A Lexical Database," Comm. ACM, vol. 38: no. 11, pp. 39-41, Nov. 1995.
[30] S.E. Madnick, “From VLDB to VMLDB (Very MANY Large Data Bases): Dealing with Large-Scale Semantic Heterogeneity,” Proc. 21st Int'l Conf. Very Large Databases, pp. 11-16, Sept. 1995.
[31] W. Effelsberg and M.V. Mannino, “Attribute Equivalence in Global Schema Design for Heterogeneous Distributed Databases,” Information Systems, vol. 9, nos. 3 and 4, pp. 237-240, 1984.
[32] A. O. Mendelzon, G. A. Mihaila, and T. Milo, "Querying the World Wide Web," Proc. Conf. Parallel and Distributed Information Systems (PDIS), 1996, pp. 80-91.
[33] P. Fankhauser, M. Kracker, and E.J. Neuhold, “Semantic vs. Structural Resemblance of Classes,” ACM SIGMOD RECORD, vol. 20, no. 4, pp. 59-63, 1991.
[34] Y. Papakonstantinou, S. Abiteboul, and H. Garcia-Molina, "Object Fusion in Mediator Systems," Proc. 22nd VLDB Conf.,Mumbai, India, pp. 413-424, 1996.
[35] M.P. Reddy, B.E. Prasad, P.G. Reddy, and A. Gupta, “A Methodology for Integration of Heterogeneous Databases,” IEEE Trans. Knowledge and Data Eng., vol. 6, no. 6, pp. 920-933, Dec. 1994.
[36] G. Salton, Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer, Addison Wesley, New York, 1989.
[37] M. Tork Roth and P. Schwarz, "Don't Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources," Proc. 23rd VLDB Conf.,Athens, pp. 266-275, 1997.
[38] I. Schmitt and G. Saake, “Managing Object Identity in Federated Database Systems,” Proc. Object-Oriented and Entity-Relationship Modeling, pp. 400-411, Dec. 1995.
[39] E. Sciore, M. Siegel, and A. Rosenthal, “Using Semantic Values to Facilitate Interoperability Among Heterogeneous Information Systems,” ACM Trans. Database Systems, vol. 19, no. 2, pp. 254-290, 1994.
[40] A.P. Sheth, S.K. Gala, and S.B. Navathe, “On Automatic Reasoning for Schema Integration,” Int'l J. Intelligent and Cooperative Information Systems, vol. 2, no. 1, June 1993.
[41] A.P. Seth and J.A. Larson,“Federated database systems for managing distributed, heterogeneous andautonomous databases,” ACM Computing Surveys, vol. 22, no. 3, pp. 184-236, September 1990.
[42] S. Spaccapietra and C. Parent, “View Integration: A Step Forward in Solving Structural Conflicts,” IEEE Trans. Knowledge and Data Eng., vol. 6, no. 2, pp. 258-274, 1994.
[43] Rational Software Corp., Unified Modeling Language Summary, version 1.1, Sept. 1997.
[44] S. Widjojo, R. Hull, and D. Wile, “A Specificational Approach to Merging Persistent Object Bases,” Implementing Persistent Object Bases: Principles and Practice (Proc. Fourth Int'l Workshop on Persistent Object Systems), A. Dearle, G. Shaw, and S. Zdonik, pp. 267-278, Dec. 1990.
[45] G. Zhou, R. Hull, and R. King, “Generating Data Integration Mediators that Use Materialization,” J. Intelligent Information Systems, vol. 6, nos. 2 and 3, pp. 199-221, 1996.
[46] World Wide Web, Consortium, Extensible Markup Language (XML), 1.0, 1998,

Index Terms:
Heterogeneous data sources, affinity, clustering, global view, schema analysis and unification.
Silvana Castano, Valeria De Antonellis, Sabrina De Capitani di Vimercati, "Global Viewing of Heterogeneous Data Sources," IEEE Transactions on Knowledge and Data Engineering, vol. 13, no. 2, pp. 277-297, March-April 2001, doi:10.1109/69.917566
Usage of this product signifies your acceptance of the Terms of Use.