This Article 
 Bibliographic References 
 Add to: 
Query Optimization in Multidatabase Systems Considering Schema Conflicts
November-December 1997 (vol. 9 no. 6)
pp. 941-955

Abstract—In a multidatabase system, the participating databases are autonomous. The schemas of these databases may be different in various ways, while the same information is represented. A global query issued against the global database needs to be translated to a proper form before it can be executed in a local database. Since data requested by a query (or a part of a query) is sometimes available in multiple sites, the site (database) that processes the query with the least cost is the desired query processing site. In this paper, we study the effect of differences in schemas on the cost of query processing in a multidatabase environment. We first classify schema conflicts to different types. For each type of conflict, we show how much more or less complex a translated query can become in comparison with the originally user-issued global query. Based on this observation, we propose an analytical method that considers the conflicts between local databases and finds the database(s) that renders the least execution cost in processing a global query. This research introduces a new level of query optimization (termed the schema-level optimization) in multidatabase environments. Our results provide a new dimension of enhancement for the capability of query optimizer in multidatabase systems.

[1] ACM Computing Surveys, special issue on heterogeneous databases, vol. 22, no. 3, Sept. 1990.
[2] R. Ahmed et al., "The Pegasus Heterogeneous Multidatabase System," Computer, vol. 24, no. 12, pp. 19-27, 1991.
[3] T.M. Anwar, H.W. Beck, and S.B. Navathe, "Knowledge Mining by Imprecise Querying: A Classification-Based Approach," Proc. Eighth Int'l Conf. Data Eng., pp. 622-630, Feb. 1992.
[4] H.W. Beck, S.K. Gala, and S.B. Navathe, Classification as a Query Processing Technique in the CANDIDE Semantic Data Model Proc. First Int'l Conf. Data Eng., pp. 572-581, 1989.
[5] Y. Breitbart,P.L. Olson, and G.R. Thompson,“Database integration in a distributed heterogeneous database system,” IEEE Conf. on Data Eng., , pp. 301-310,Los Angeles, CA, February 1986.
[6] A. Chatterjee and A. Segev, "Data Manipulation in Heterogeneous Databases," SIGMOD Record, vol. 20, no. 4, pp. 64-68, ACM, Dec. 1991.
[7] A.L.P. Chen, “Outerjoin Optimization in Multidatabase Systems,” Proc. IEEE Int'l Symp. Databases in Parallel and Distributed Systems, 1990.
[8] C.-J. Chen, "Reduction of Access Sites for Query Optimization in Multidatabase Systems," MS thesis, Inst. of Information Engineering, National Cheng-Kung Univ., Tainan, Taiwan, July 1994.
[9] Computer, special issue on heterogeneous distributed database systems, vol. 24, no. 12, Dec. 1991.
[10] B. Czejdo, M. Rusinkiewicz, and D.W. Embley, “An Approach to Schema Integration and Query Formulation in Federated Database Systems,” Proc. Third Int'l Conf. Data Eng., pp. 477-484, 1987.
[11] U. Dayal and H. Hwang, "View Definition and Generalization for Database Integration in Multidatabases: A System for Heterogeneous Distributed Databases," IEEE Trans. Software Eng., vol. 10, no. 6, pp. 628-644, Nov. 1984.
[12] L.G. Demichiel, “Resolving Database Incompatibility: An Approach to Performing Relational Operations over Mismatched Domains,” IEEE Trans. Knowledge and Data Eng., vol. 4, pp. 485-493, 1989.
[13] D.J. DeWitt, R.H. Katz, F. Olken, L.D. Shapiro, and M.R. Stonebraker, “Implementation Techniques for Main Memory Database Systems,” Proc. ACM SIGMOD, 1984.
[14] D.J. DeWitt,S. Ghandeharizadeh,D.A. Schneider,A. Bricker,H.I. Hsiao,, and R. Rasmussen,“The gamma database machine project,” IEEE Trans. on Knowledge and Data Engineering, vol. 2, no. 1, pp. 44-62, Mar. 1990.
[15] W. Du, R. Krishnamurthy, and M.C. Shan, "Query Optimization in a Heterogeneous DBMS," Proc. 18th Conf. Very Large Databases,Vancouver, B.C., Canada, Morgan Kaufmann, Aug. 1992.
[16] W. Du, M. Shan, and U. Dayal, "Reducing Multidatabase Query Response Time by Tree Balancing," Proc. ACM SIGMOD Conf., pp. 293-303, 1995.
[17] P.A. Dwyer and J.A. Larson, "Some Experiences with a Distributed Database Tested System," Proc. IEEE, vol. 75, no. 5, May 1987.
[18] R. Elmasri and S.B. Navathe, Fundamentals of Database Systems, second ed., Benjamin/Cummings, 1994.
[19] J. Grant, W. Litwin, N. Roussopoulos, and T. Sellis, "Query Languages for Relational Multidatabases," VLDB J., vol. 2, no. 2, Apr. 1993.
[20] D.K. Hsiao and M.N. Kamel, "Heterogeneous Databases: Proliferations, Issues, and Solutions," IEEE Trans. Knowledge and Data Eng., vol. 1, no. 1, Mar. 1989.
[21] K.A. Hua, C. Lee, and C.M. Hua, "Dynamic Load Balancing in Multicomputer Database Systems Using Partition Tuning," IEEE Trans. Knowledge and Data Eng., vol. 7, no. 6, pp. 968-983, Dec. 1995.
[22] H.-Y. Hwang, U. Dayal, and M.G. Gouda, "Using Semiouterjoins to Process Queries in Multidatabase Systems," Proc. SIGMOD Conf., 1984.
[23] Proc. First Int'l Workshop Interoperability Multidatabase Systems, Y. Kambayashi, M. Rusinkiewicz, and A. Sheth, eds., Kyoto, Japan, 1991.
[24] Proc. Int'l Workshop Interoperability Multidatabase Systems,Vienna, Austria, 1993.
[25] W. Kim and J. Seo, "Classifying Schematic and Data Heterogeneity in Multidatabase Systems," Computer, Dec. 1991.
[26] R. Krishnamurthy, W. Litwin, and W. Kent, "Language Features for Interoperability of Databases with Schematic Discrepancies," Proc. ACM SIGMOD, 1991.
[27] C. Lee and C.-J. Chen, "Interdatabase Joins in Multidatabase Systems," Technical Report TR-83-801, Inst. of Information Engineering, National Cheng-Kung Univ., Tainan, Taiwan, 1994.
[28] C. Lee, C.-J. Chen, and H. Lu, "An Aspect of Query Optimization in Multidatabase Systems," ACM SIGMOD Record, vol. 24, no. 3, pp. 28-33, Sept. 1995.
[29] C. Lee and Z.-A. Chang, "Utilizing Page-Level Join Index for Optimization in Parallel Join Execution," IEEE Trans. Knowledge and Data Eng., vol. 7, no. 6, Dec. 1995.
[30] C. Lee and M.-C. Wu, "A Hyperrelational Approach to Integration and Manipulation of Data in Multidatabase Systems," Int'l J. Intelligent Cooperative Information Systems, vol. 5, no. 4, pp. 395-429, 1996.
[31] E.-P. Lim, J. Srivstav, S. Prabhakar, and J. Richardson, "Entity Identification in Database Integration," Proc. Int'l Conf. Data Eng., pp. 294-301, 1993.
[32] W. Litwin et al., "Multidatabase interoperability," IEEE Computer, vol. 12, 1986.
[33] W. Litwin and P. Vigier,“Dynamic attributes in the multidatabase system MRDSM,” IEEE Conf. on Data Eng.,Los Angeles, CA, pp. 103-110, February 1986.
[34] W. Litwin, L. Mark, and N. Roussopoulos, "Interoperability of Multiple Autonomous Databases," ACM Computing Surveys, vol. 22, no. 3, 1990, pp. 267-293.
[35] W. Litwin, M. Ketabchi, and R. Krishnamurthy, "First Order Normal Form for Relational Databases and Multidatabases," ACM SIGMOD Record, vol. 20, no. 4, Dec. 1991.
[36] H. Lu, B.-C. Ooi, and C.-H. Goh, "Multidatabase Query Optimization: Issues and Solutions," Proc. Workshop Interoperability Multidatabase Systems, 1993.
[37] W. Meng, C. Yu, K.C. Guh, and S. Dao, "Processing Multidatabase Queries Using the Fragment and Replicate Strategy," Technical Report CS-TR-93-16, Dept. of Computer Science, State Univ. of New York, Binghamton, 1993.
[38] W. Meng and C. Yu, "Query Processing in Multidatabase Systems," Modern Database Systems: The Object Model, Interoperability, and Beyond, W. Kim, ed., chapter 27, pp. 551-572. Addison-Wesley/ACM Press, 1995.
[39] A.H.H. Ngu, L.L. Yan, and L.S. Wong, "Heterogeneous Query Optimization Using Maximal Sub-Queries," Proc. Conf. Database Systems Advanced Applications (DASFAA), 1993.
[40] T.M. Ozsu and P. Valduriez, Principles of Distributed Database Systems. Prentice Hall, 1991.
[41] ACM SIGMOD Record, special issue on semantic issues in multidatabase systems, vol. 20, no. 4, Dec. 1991.
[42] A.P. Seth and J.A. Larson,“Federated database systems for managing distributed, heterogeneous andautonomous databases,” ACM Computing Surveys, vol. 22, no. 3, pp. 184-236, September 1990.
[43] M. Templeton, D. Brill, S.K. Dao, E. Lund, P. Ward, A.L.P. Chen, and R. Macgregor, "Mermaid—A Front-End to Distributed Heterogeneous Databases," Proc. IEEE, vol. 75, no. 5, May 1987.
[44] Y.R. Wang and S.E. Madnick, "The Inter-Database Instance Identification Problem in Integrating Autonomous Systems," Proc. Int'l Conf. Data Eng., 1989.
[45] W.K. Whang, S. Chakravarthy, and S.B. Navathe, "Heterogeneous Databases: Inferring Relationships for Merging Component Schemas and a Query Language," Technical Report UF-CIS-TR-92-048, Dept. of Computer and Information Sciences, Univ. of Florida, 1992.
[46] J.-C. Yan, "Query Optimization in Heterogeneous Database Sysetems," masters thesis, Inst. of Information Engineering, National Cheng-Kung Univ., Tainan, Taiwan, June 1993.
[47] C. Yu, W. Sun, S. Dao, and D. Keirsey, "Determining Relationships Among Attributes for Interoperability of Multi-Database Systems," Proc. Workshop Interoperability Multidatabase Systems, 1991.
[48] Q. Zhu and P.A. Larson, "A Query Sampling Method for Estimating Local Cost Parameters in a Multidatabase System," Proc. Int'l Conf. Data Eng., pp. 144-153, 1994.

Index Terms:
Multidatabase systems, query optimization, schema conflicts, heterogeneous databases, interoperability, autonomous systems.
Chiang Lee, Chia-Jung Chen, "Query Optimization in Multidatabase Systems Considering Schema Conflicts," IEEE Transactions on Knowledge and Data Engineering, vol. 9, no. 6, pp. 941-955, Nov.-Dec. 1997, doi:10.1109/69.649318
Usage of this product signifies your acceptance of the Terms of Use.