This Article 
 Bibliographic References 
 Add to: 
Semantic Query Optimization for Query Plans of Heterogeneous Multidatabase Systems
November/December 2000 (vol. 12 no. 6)
pp. 959-978

Abstract—New applications of information systems, such as electronic commerce and healthcare information systems, need to integrate a large number of heterogeneous databases over computer networks. Answering a query in these applications usually involves selecting relevant information sources and generating a query plan to combine the data automatically. As significant progress has been made in source selection and plan generation, the critical issue has been shifting to query optimization. This paper presents a semantic query optimization (SQO) approach to optimizing query plans of heterogeneous multidatabase systems. This approach provides global optimization for query plans as well as local optimization for subqueries that retrieve data from individual database sources. An important feature of our local optimization algorithm is that we prove necessary and sufficient conditions to eliminate an unnecessary join in a conjunctive query of arbitrary join topology. This feature allows our optimizer to utilize more expressive relational rules to provide a wider range of possible optimizations than previous work in SQO. The local optimization algorithm also features a new data structure called AND-OR implication graphs to facilitate the search for optimal queries. These features allow the global optimization to effectively use semantic knowledge to reduce data transmission cost. We have implemented this approach into the pesto query plan optimizer as a part of the sims information mediator. Experimental results demonstrate that pesto can provide significant savings in query execution cost over query plan execution without optimization.

[1] G. Wiederhold, "Mediators in the Architecture of Future Information Systems," Computer, pp. 38-49, Mar. 1992.
[2] Y. Arens, C.Y. Chee, C.-N. Hsu, and C.A. Knoblock, “Retrieving and Integrating Data from Multiple Information Sources,” Int'l J. Intelligent and Cooperative Information Systems, vol. 2, no. 2, pp. 127–159, 1993.
[3] C.A. Knoblock, Y. Arens, and C.-N. Hsu, “Cooperating Agents for Information Retrieval,” Proc. Second Int'l Conf. Intelligent and Cooperative Information Systems (Coopis-94), 1994.
[4] T. Kirk, A.Y. Levy, Y. Sagiv, and D. Srivastava, “The Information Manifold,” Working Notes AAAI Spring Symp. Information Gathering in Heterogeneous, Distributed Environments, Technical Report SS-95-08, 1995.
[5] J. Hammer, H. Garcia-Molina, K. Ireland, Y. Papakonstantinou, J. Ullman, and J. Widom, “Information Translation, Mediation, and Mosaic-Based Browsing in the TSIMMIS System,” Proc. ACM SIGMOD Int'l Conf. Management of Data, 1995.
[6] M. Stonebraker, P.M. Aoki, W. Litwin, A. Pfeffer, A. Sah, J. Sidell, C. Staelin, and A. Yu, “Mariposa: A Wide-Area Distributed Database System,” The VLDB J., vol. 5, no. 1, pp. 48–63, 1996.
[7] P.M. Apers, A.R. Hevner, and S. Yao, “Optimizing Algorithms for Distributed Queries,” IEEE Trans. Software Eng., vol. 9, pp. 57–68, 1983.
[8] M. Jarke and J. Koch, “Query Optimization in Database Systems,” ACM Computer Surveys, vol. 16, pp. 111–152, 1984.
[9] A.Y. Levy, D. Srivastava, and T. Kirk, “Data Model and Query Evaluation in Global Information Systems,” J. Intelligent Information Systems, special issue on networked information discovery and retrieval, vol. 5, no. 2, 1995.
[10] Y. Arens, C.A. Knoblock, and W.-M. Shen, “Query Reformulation for Dynamic Information Integration,” J. Intelligent Information Systems, special issue on intelligent information integration, vol. 6, nos. 2/3, pp. 99–130, 1996.
[11] A.Y. Levy, A. Rajaraman, and J.J. Ordille, “Querying Heterogeneous Information Sources Using Source Descriptions,” Proc. 22nd VLDB Conf. (VLDB-96), 1996.
[12] C.A. Knoblock and A. Levy, “Exploiting Run-Time Information for Efficient Processing of Queries,” Working Notes AAAI Spring Symp. Information Gathering in Heterogeneous, Distributed Environments, 1995.
[13] C.A. Knoblock, “Planning, Executing, Sensing, and Replanning for Information Gathering,” Proc. 13th Int'l Joint Conf. Artificial Intelligence (IJCAI-95), 1995.
[14] M. Hammer and S.B. Zdonik, “Knowledge-Based Query Processing,” Proc. Sixth VLDB Conf., pp. 137–146, 1980.
[15] J.J. King, “Query Optimization by Semantic Reasoning,” PhD thesis, Dept. of Computer Science, Stanford Univ., 1981.
[16] M.D. Siegel, “Automatic Rule Derivation for Semantic Query Optimization,” Proc. Second Int'l Conf. Expert Database Systems, L. Kerschberg, ed., pp. 371–385, 1988.
[17] S. Shekhar, J. Srivastava, and S. Dutta, “A Formal Model of Trade-Off between Optimization and Execution Costs in Semantic Query Optimization,” Proc. 14th VLDB Conf., 1988.
[18] S.T. Shenoy and Z.M. Ozsoyoglu, “Design and Implementation of a Semantic Query Optimizer,” IEEE Trans. Knowledge and Data Eng., vol. 1, no. 3, pp. 344–361, 1989.
[19] C.T. Yu and W. Sun, “Automatic Knowledge Acquisition and Maintenance for Semantic Query Optimization,” IEEE Trans. Knowledge and Data Eng., vol. 1, no. 3, pp. 362–375, 1989
[20] U.S. Chakravarthy, J. Grant, and J. Minker, “Logic-based Approach to Semantic Query Optimization,” ACM Trans. Database Systems, vol. 15, no. 2, pp. 162–207, 1990.
[21] W. Sun and C.T. Yu, “Semantic Query Optimization for Tree and Chain Queries,” IEEE Trans. Knowledge and Data Eng., vol. 6, no. 1, pp. 136–151, 1994.
[22] S. Shekhar, B. Hamidzadeh, A. Kohli, and M. Coyle, “Learning Transformation Rules for Semantic Query Optimization: A Data-Driven Approach,” IEEE Trans. Knowledge and Data Eng., vol. 5, no. 6, pp. 950–964, 1993.
[23] C.-N. Hsu and C.A. Knoblock, “Rule Induction for Semantic Query Optimization,” Proc. 11th Int'l Conf. Machine Learning, (ML-94), 1994.
[24] C.-N. Hsu and C.A. Knoblock, “Using Inductive Learning to Generate Rules for Semantic Query Optimization,” Advances in Knowledge Discovery and Data Mining, U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, eds., chapter 17, AAAI Press/MIT Press, 1996.
[25] C.-N. Hsu, “Learning Effective and Robust Knowledge for Semantic Query Optimization,” PhD thesis, Dept. of Computer Science, Univ. of Southern California, 1996. Also available as USC/ISI Technical Report RR-96-451 or.
[26] J. Ullman, Principles of Database and Knowledge-Base Systems, vol. 1. Computer Science Press, 1988.
[27] R. MacGregor, “The Evolving Technology of Classification-Based Knowledge Representation Systems,” Principles of Semantic Networks: Explorations in the Representation of Knowledge, J. Sowa, ed., Morgan Kaufmann, 1990.
[28] J.W. Lloyd, Foundations of Logic Programming, Springer Series in Symbolic Computation, second ed. New York: Springer-Verlag, 1987.
[29] E.F. Codd, The Relational Model for Database Management, version 2. Addison-Wesley, 1990.
[30] C.-N. Hsu and C.A. Knoblock, “Reformulating Query Plans for Multidatabase Systems,” Proc. Second Int'l Conf. Information and Knowledge Management (CIKM-93), 1993.
[31] M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness.New York: W.H. Freeman, 1979.
[32] ORACLE}, ORACLE 7 Server Concepts Manual. Redwood, Calif.: Oracle Corp., Dec. 1992.
[33] A.Y. Levy, I.S. Mumick, and Y. Sagiv, “Query Optimization by Predicate Move-Around,” Proc. 20th VLDB Conf., 1994.
[34] P.R. Cohen, Empirical Methods for Artificial Intelligence. Cambridge, Mass.: MIT Press, 1995.
[35] C.T. Kwok and D.S. Weld, “Planning to Gather Information,” Proc. 13th Nat'l Conf. Artificial Intelligence (AAAI-96), 1996.
[36] W.P. Yan and P.-A. Larson, “Performing Group-by before Join,” Proc. 10th Int'l Conf. Data Eng., pp. 89–100, 1994.

Index Terms:
Semantic query optimization, heterogeneous multidatabase systems, relational rules, joins, information mediators.
Chun-Nan Hsu, Craig A. Knoblock, "Semantic Query Optimization for Query Plans of Heterogeneous Multidatabase Systems," IEEE Transactions on Knowledge and Data Engineering, vol. 12, no. 6, pp. 959-978, Nov.-Dec. 2000, doi:10.1109/69.895804
Usage of this product signifies your acceptance of the Terms of Use.