This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Scaling Access to Heterogeneous Data Sources with DISCO
September/October 1998 (vol. 10 no. 5)
pp. 808-823

Abstract—Accessing many data sources aggravates problems for users of heterogeneous distributed databases. Database administrators must deal with fragile mediators, that is, mediators with schemas and views that must be significantly changed to incorporate a new data source. When implementing translators of queries from mediators to data sources, database implementors must deal with data sources that do not support all the functionality required by mediators. Application programmers must deal with graceless failures for unavailable data sources. Queries simply return failure and no further information when data sources are unavailable for query processing. The Distributed Information Search COmponent (DISCO) addresses these problems. Data modeling techniques manage the connections to data sources, and sources can be added transparently to the users and applications. The interface between mediators and data sources flexibly handles different query languages and different data source functionality. Query rewriting and optimization techniques rewrite queries so they are efficiently evaluated by sources. Query processing and evaluation semantics are developed to process queries over unavailable data sources. In this article, we describe 1) the distributed mediator architecture of DISCO; 2) the data model and its modeling of data source connections; 3) the interface to underlying data sources and the query rewriting process; and 4) query processing semantics. We describe several advantages of our system.

[1] S. Adali, K.S. Candan, Y. Papakonstantinou, and V.S. Subrahmaniam, "Query Caching and Optimization in Distributed Mediator Systems," Proc. ACM SIGMOD Conf. Management of Data, pp. 137-148, 1996.
[2] J. Blakeley, "Data Access for the Masses Through OLE DB," Proc. ACM SIGMOD Int'l Conf. Management of Data,New York, vol. 25, no. 2of ACM SIGMOD Record, pp. 161-172, ACM Press, 1996.
[3] G. Gardarin et al., "IRO-DB: A Distributed System Federating Object and Relational Databases," Object-Oriented Multidatabase Systems: A Solution for Advanced Applications, O.A. Bukhres and A.K. Elmagarmid, eds. Prentice Hall, 1996.
[4] J. Hammer, H. Garcia-Molina, K. Ireland, Y. Papakonstantinou, J. Ullman, and J. Widom, “Information Translation, Mediation, and Mosaic-Based Browsing in the TSIMMIS System,” Proc. ACM SIGMOD Int'l Conf. Management of Data, 1995.
[5] R. Hull and R. King, "Index to the Reference Architecture for the Intelligent Integration of Information (I3)," U.S. Government ARPA, May 1995; WWW:http://isse.gmu.edu/I3_Archindex.html.
[6] W. Kim, Modern Database Systems: The Object Model, Interoperability, and Beyond.New York: ACM Press, 1995.
[7] L. Liu, C. Pu, R. Barga, and T. Zhou, "Differential Evaluation of Continual Queries," Proc. 16th Int'l Conf. Distributed Computing Systems, pp. 458-465,Hong Kong, IEEE CSPress, 1996.
[8] E. Mesrobian, R. Muntz, E. Shek, S. Nittel, M. LaRouche, and M. Kriguer, "OASIS: An Open Architecture Scientific Information System," Proc. RIDE '96,New Orleans, IEEE Press, 1996.
[9] M. Templeton, H. Henley, E. Maros, and D.J. Van Buer, "InterViso: Dealing with the Complexity of Federated Database Access," VLDB J. vol. 4, 1995.
[10] M. Tork Roth, M. Arya, L.M. Haas, M.J. Carey, W. Cody, R. Fagin, P.M. Schwarz, J. Thomas, and E.L. Wimmers, "The Garlic Project," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 557-558,Montreal, project demonstration, June 1996.
[11] G. Wiederhold, "Mediators in the Architecture of Future Information Systems," Computer, pp. 38-49, Mar. 1992.
[12] O. Kapitskaia, A. Tomasic, and P. Valduriez, "Dealing with Discrepancies in Wrapper Functionality," Technical Report RR-3138, INRIA, 1997.
[13] H. Naacke, G. Gardarin, and A. Tomasic, "Leveraging Mediator Cost Models with Heterogeneous Data Sources," Proc. ICDE 14: Int'l Conf. Data Eng.,Orlando, Fla., 1998.
[14] P. Bonnet and A. Tomasic, "Partial Answers for Unavailable Data Sources," Technical Report RR-3127, INRIA, 1997.
[15] A. Tomasic, L. Raschid, and P. Valduriez, "Scaling Heterogeneous Databases and the Design of DISCO," Technical Report No. 2,704, INRIA Rocquencourt, France, extended version, 1995.
[16] A. Tomasic, L. Raschid, and P. Valduriez, Scaling Heterogeneous Databases and the Design of DISCO Proc. Int'l Conf. Distributed Computing Systems, 1996.
[17] R. Ahmed et al., "The Pegasus Heterogeneous Multidatabase System," Computer, vol. 24, no. 12, pp. 19-27, 1991.
[18] W. Kim et al., "On Resolving Schematic Heterogeneity in MultiDatabase Systems," Distributed and Parallel Databases, vol. 3, no. 1, 1993.
[19] W. Kim and J. Seo, "Classifying Schematic and Data Heterogeneity in Multidatabase Systems," Computer, Dec. 1991.
[20] Y. Arens, C.Y. Chee, C.-N. Hsu, and C.A. Knoblock, "Retrieving and Integrating Data from Multiple Information Sources," Int'l J. Intelligent and Cooperative Information Systems, vol. 2, no. 2, pp. 127-158, 1993.
[21] C. Batini, M. Lenzerini, and S.B. Navathe, “A Comparative Analysis of Methodologies for Database Schema Integration,” ACM Computing Surveys, vol. 18, no. 2, pp. 323-364, Dec. 1986.
[22] T. Barsalou and D. Gangopadhay, "M(dm): An Open Framework for Interoperation of Multimodel Multidatabase Systems," Proc. Int'l Conf. Data Eng., 1992.
[23] J. Chomicki and W. Litwin, "Declarative Definition of Object-Oriented Multidatabase Mappings," Distributed Object Management, M.T. Oszu, U. Dayal, and P. Valduriez, eds. Morgan Kaufmann, 1993.
[24] W. Kent, "Solving Domain Mismatch and Schema Mismatch Problems with an Object-Oriented Database Programming Language," Proc. 17th Conf. Very Large Databases,Barcelona, Spain, Morgan Kaufmann, Sept. 1991.
[25] R. Krishnamurthy, W. Litwin, and W. Kent, "Language Features for Interoperability of Databases with Schematic Discrepancies," Proc. ACM SIGMOD, 1991.
[26] L.V.S. Lakshmanan, F. Sadri, and I.N. Subramanian, “SchemaSQL—A Language for Interoperability in Relational Multi-Database Systems,” Proc. 22nd Int'l Conf. Very Large Data Bases, T.M. Vijayaraman et al., eds., pp. 239-250, Sept. 1996.
[27] L.V.S. Lakshmanan, F. Sadri, and I.N. Subramanian, "On the Logical Foundations of Schema Integration and Evolution in Heterogeneous Database Systems," Proc. Int'l Conf. Deductive and Object-Oriented Databases, 1993.
[28] S. Chakravarthy, W.-K. Whang, and S.B. Navathe, "A Logic-Based Approach to Query Processing in Federated Databases," technical report, Univ. of Florida, 1993.
[29] A. Lefebvre, P. Bernus, and R. Topor, "Query Transformation for Accessing Heterogeneous Databases," Proc. Joint Int'l Conf. and Symp. Logic Programming, Workshop on Deductive Databases, 1992.
[30] X. Qian, "Query Folding," Proc. Int'l Conf. Extended Database Technology, 1996.
[31] X. Qian and L. Raschid, "Translating Object-Oriented Queries to Relational Queries," Proc. IEEE Int'l Conf. Data Eng., 1995.
[32] L. Raschid and Y. Chang, "Interoperable Query Processing from Object to Relational Schemas Based on a Parameterized Canonical Representation," Int'l J. Intelligent and Cooperative Information Systems, 1995.
[33] R.G.G. Cattell, D.K. Barry et al. , Object Database Standard—ODMG 2.0. Morgan Kaufmann, 1997.
[34] A.L.P. Chen, J.L. Koh, T.C.T. Kuo, and C.C. Liu, "Schema Integration and Query Processing for Multiple Object Databases," Integrated Computer-Aided Eng., special issue on multidatabase and interoperable sytems, vol. 2, no. 1, 1995.
[35] Y. Papakonstantinou, S. Abiteboul, and H. Garcia-Molina, "Object Fusion in Mediator Systems," Proc. 22nd VLDB Conf.,Mumbai, India, pp. 413-424, 1996.
[36] M. Carey et al., "Towards Heterogeneous Multimedia Information Systems: The Garlic Approach," technical report, IBM Almaden Research, 1995.
[37] M. Tork Roth and P. Schwarz, "Don't Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources," Proc. 23rd VLDB Conf.,Athens, pp. 266-275, 1997.
[38] L.M. Haas, D. Kossmann, E.L. Wimmers, and J. Yang, "Optimizing Queries Across Diverse Data Sources," Proc. 23rd VLDB Conf., pp. 276-285, 1997.
[39] D. Florescu, L. Raschid, and P. Valduriez, "Using Heterogeneous Equivalences for Query Rewriting in Multidatabase Systems," Proc. Int'l Conf. Cooperating Information Systems, 1995.
[40] D. Florescu, L. Raschid, and P. Valduriez, "Answering Queries Using OQL View Expressions," Proc. Workshop Materialized Views: Techniques and Applications, in conjunction with ACM SIGMOD International Conference, 1996.
[41] A.Y. Levy, A. Rajaraman, and J.J. Ordille, “Querying Heterogeneous Information Sources Using Source Descriptions,” Proc. 22nd VLDB Conf. (VLDB-96), 1996.
[42] A.Y. Levy, A.O. Mendelzon, and Y. Sagiv, “Answering Queries Using Views,” Proc. ACM Symp. Principles of Database Systems, pp. 95-104, May 1995.
[43] A.Y. Levy, D. Srivastava, and T. Kirk, “Data Model and Query Evaluation in Global Information Systems,” J. Intelligent Information Systems, special issue on networked information discovery and retrieval, vol. 5, no. 2, 1995.
[44] Microsoft Open Database Connectivity Documentation, Microsoft, Redmond, Wash., 1997; WWW:http://www.microsoft.comodbc.
[45] Y. Papakonstantinou, A. Gupta, and L. Haas, "Capabilities-Based Rewriting in Mediator Systems," technical report, IBM Almaden Research, 1996.
[46] W. Du, R. Krishnamurthy, and M.C. Shan, "Query Optimization in a Heterogeneous DBMS," Proc. 18th Conf. Very Large Databases,Vancouver, B.C., Canada, Morgan Kaufmann, Aug. 1992.
[47] G. Gardarin, F. Sha, and Z.-H. Tang, "Calibrating the Query Optimizer Cost Model of IRO-DB," Proc. 22nd VLDB Conf.,Mumbai, India, 1996.
[48] S.V. Vrbsky and J.W.S. Liu, "APPROXIMATE: A Query Processor that Produces Monotonically Improving Approximate Answers," IEEE Trans. Knowledge and Data Eng., vol. 5, no. 6, pp. 1,056-1,068, Dec. 1993.
[49] C. Consel and O. Danvy, "Tutorial Notes on Partial Evaluation," ACM Symp. Principles of Programming Languages, pp. 493-501, 1993.
[50] P. Schwarz and K. Shoens, "Managing change in the Rufus system," Proc. IEEE Int'l Conf. Data Eng., 1994.
[51] G. Graefe, "Query Evaluation Techniques for Large Databases," ACM Computing Surveys, vol. 25, no. 2, pp. 73-170, June 1993.
[52] V. Vassalos and Y. Papakonstantinou, "Describing and Using Query Capabilities of Heterogeneous Sources," Proc. 23rd VLDB Conf.,Athens, 1997.
[53] A. Tomasic, R. Amouroux, P. Bonnet, O. Kapitskaia, H. Naacke, and L. Raschid, "The Distributed Information Search Component (DISCO) and the World-Wide Web," Proc. ACM SIGMOD Int'l Conf. Management of Data,Tuscon, Ariz., prototype demonstration, 1997.

Index Terms:
Heterogeneous database, query reformulation, source capability, heterogeneous cost model, partial answer, partial evaluation.
Citation:
Anthony Tomasic, Louiqa Raschid, Patrick Valduriez, "Scaling Access to Heterogeneous Data Sources with DISCO," IEEE Transactions on Knowledge and Data Engineering, vol. 10, no. 5, pp. 808-823, Sept.-Oct. 1998, doi:10.1109/69.729736
Usage of this product signifies your acceptance of the Terms of Use.