This Article 
 Bibliographic References 
 Add to: 
Optimizing Queries with Foreign Functions in a Distributed Environment
July/August 2002 (vol. 14 no. 4)
pp. 809-824

Foreign functions have been considered in the advanced database systems to support complex applications. In this paper, we consider optimizing queries with foreign functions in a distributed environment. In traditional distributed query processing, selection operations are locally processed before joins as much as possible so that the size of relations being transmitted and joined can be reduced. However, if selection predicates involve foreign functions, the cost of evaluating selections cannot be ignored. As a result, the execution order of selections and joins becomes significant, and the trade-off for reducing the costs of data transmission, join processing, and selection predicate evaluation needs to be carefully considered in query optimization. In this paper, a response time model is developed for estimating the cost of distributed query processing involving foreign functions. We explore the property of the problem and find an optimal algorithm with polynomial complexity for a special case of it. However, finding the optimal execution plan for the general case is NP-hard. We propose an efficient heuristic algorithm for solving the problem and the simulation result shows its good quality. The research result can also be applied to the advanced database systems and the multidatabase systems where the conversion function defined for the need of schema integration can be considered a type of foreign functions.

[1] R. Ahmed et al., "The Pegasus Heterogeneous Multidatabase System," Computer, vol. 24, no. 12, pp. 19-27, 1991.
[2] P.M.G. Apers, A.R. Hevner, and S.B. Yao, “Optimization Algorithms for Distributed Queries,” IEEE Trans. Software Eng., vol. 9, no. 1, pp. 57-68, 1983.
[3] M.P. Atkinson and O.P. Buneman, “Types and Persistence in Database Programming Languages,” ACM Computing Surveys, Vol. 19 No. 2 June 1987, pp. 105-190.
[4] P.A. Bernstein, N. Goodman, E. Wong, C. Reeve, and J.B. Rothnie, “Query Processing in a System for Distributed Databases,” ACM Trans. Database Systems, vol. 6, no. 4, pp. 602-625, Dec. 1981.
[5] Y. Breitbart,P.L. Olson, and G.R. Thompson,“Database integration in a distributed heterogeneous database system,” IEEE Conf. on Data Eng., , pp. 301-310,Los Angeles, CA, February 1986.
[6] M. Carey,D. DeWitt,J. Richardson,, and E. Shetika,“Object and file management in the EXODUS extensible database system,” Proc. 12th Int’l Conf. Very Large Databases, pp. 91-100, Aug. 1986.
[7] S. Chaudhuri and K. Shim, “Query Optimization in the Presence of Foreign Functions,” Proc. 19th Conf. Very Large Data Bases, pp. 526–542, 1993.
[8] S. Chaudhuri, U. Dayal, and T.W. Yan, "Join Query with External Text Sources: Execution and Optimization Techniques," Proc. ACM SIGMOD, pp. 410-422, May 1995.
[9] S. Chaudhuri and K. Shim, “Optimization of Queries with User-Defined Predicates,” Proc. Conf. Very Large Data Bases (VLDB), pp. 87-98, Sept. 1996.
[10] H. Chen, X. Yu, K. Yamaguchi, H. Kitagawa, N. Ohbo, and Y. Fujiwara, “Decomposition—An Approach for Optimizing Queries Including ADT Functions,” Information Processing Letters, vol. 43, pp. 327-333, 1992.
[11] S. Christodoulakis,“Implications of certain assumptions in database performance evaluation,” ACM Trans. on Database Systems, vol. 9, no. 2, pp. 163-186, June 1984.
[12] U. Dayal and H.Y. Hwang, “View Definition and Generalization for Database Integration in a Multidatabase System,” IEEE Trans. Software Eng., vol. 10, no. 6, pp. 628-644, 1984.
[13] O. Dexu et al., "The Story of O2," IEEE Trans. Knowledge and Data Eng., pp. 91-108, Mar. 1990.
[14] D. Fishman et al., “Iris: An Object-Oriented Database Management System,” ACM Trans. Office Information Systems, vol. 5, pp. 48-69, 1987.
[15] M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness.New York: W.H. Freeman, 1979.
[16] G. Graefe and W. McKenna, “The Volcano Optimizer Generator: Extensibility and Efficient Search,” Proc. IEEE Conf. Data Eng., pp. 209-218, Apr. 1993.
[17] L.M. Haas,W.F. Cody,J.C. Freytag,G. Lapis,B.G. Lindsay,G.M. Lohman,K. Ono,, and H. Pirahesh,“Extensible query processing in Starburst,” Proc. ACM SIGMOD Int’l Conf. Management of Data, pp. 377-388, 1989.
[18] L. Haas,W. Chang,G.M. Lohman et al., "Starburst mid-flight: As the dust clears," , IEEE Trans. Knowledge and Data Engineering, vol. 2, no. 1, pp. 143-160, Mar. 1990.
[19] J.M. Hellerstein, “Predicate Migration: Optimizing Queries with Expensive Predicates,” Proc. ACM SIGMOD Conf., pp. 267-276, 1993.
[20] J.M. Hellerstein, “Practical Predicate Placement,” Proc. ACM SIGMOD Conf., pp. 325-335, 1993.
[21] J.M. Hellerstein and J.F. Naughton, “Query Execution Techniques for Caching Expensive Methods,” Proc. ACM SIGMOD Conf., 1996.
[22] R. Hull and R. King, “Semantic Database Modelling: Survey, Applications, and Research Issues,” Computing Surveys, vol. 19, no. 3, pp. 201-260, ACM, Sept. 1987.
[23] Y.E. Ioannidis and S. Christodoulakis, “On the Propagation of Errors in the Size of Join Results,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 268-277, 1991.
[24] A. Kemper, G. Moerkotte, H.D. Walter, and A. Zachmann, “GOM: A Strongly Typed, Persistent Object Model with Polymorphism,” Proc. Datenbanksysteme in Buro, Technik und Wissenschaft, pp. 198-217, 1991.
[25] A. Kemper,C. Kilger,, and G. Moerkotte,“Function materialization in object bases,” Proc. ACM-SIGMOD Int’l Conf. Management of Data,Denver, Colo., May 1991.
[26] A. Kemper, G. Moerkotte, K. Peithner, and M. Steinbrunn, “Optimizing Disjunctive Queries with Expensive Predicates,” Proc. ACM SIGMOD Conf. Management of Data, pp. 336-347, May 1994.
[27] A.Y. Levy, I.S. Mumick, and Y. Sagiv, “Query Optimization by Predicate Move-Around,” Proc. 20th VLDB Conf., 1994.
[28] G. Lohman et al., “Query Processing in$\big. {\rm R^{\ast}}\bigr.$,” Query Processing in Database Systems, W. Kim, D. Reiner, and D.S. Batory, eds., Springer Verlag, 1985.
[29] G. Mitchell, U. Dayal, and S.B. Zdonik, “Control of an Extensible Query Optimizer: A Planning-Based Approach,” Proc. Conf. Very Large Databases (VLDB), pp. 517-528, Aug. 1993.
[30] H. Pirahesh, J.M. Hellerstein, and W. Hasan, “Extensible/Rule Based Query Rewrite Optimization in Starburst,” Proc. ACM Int'l Conf. Management of Data, pp. 39-48, 1992.
[31] M. Stonebraker, “Inclusion of New Types in Relational Data Base Systems,” Proc. Int'l Conf. Data Eng., pp. 262-269, 1986.
[32] M. Stonebraker and L. Rowe, “The Design of Postgres,” Proc. 1986 SIGMOD Conf. Management of Data, ACM Press, New York, 1986.
[33] C. Wang, A.L.P Chen, and S.-C. Shyu, “A Parallel Execution Method for Minimizing Distributed Query Response Time,” IEEE Trans. Parallel and Distributed Systems, vol. 3, no. 3, pp. 325-333, Mar. 1992.
[34] K. Wilkinson,P. Lyngbæk,, and W. Hasan,“The IRIS architecture and implementation,” IEEE Trans. Knowledge and Data Eng., vol. 2, no. 1, pp. 63-75, Mar. 1990.
[35] S.B. Yoo and P.C.Y. Sheu, “Evaluation and Optimization of Query Programs in an Object-Oriented and Symbolic Information System,” IEEE Trans. Knowledge and Data Eng., vol. 5, no. 3, pp. 479-495, 1993.

Index Terms:
Distributed environment, foreign function, query optimization, response time model, simulation.
Pauray S.M. Tsai, Arbee L.P. Chen, "Optimizing Queries with Foreign Functions in a Distributed Environment," IEEE Transactions on Knowledge and Data Engineering, vol. 14, no. 4, pp. 809-824, July-Aug. 2002, doi:10.1109/TKDE.2002.1019215
Usage of this product signifies your acceptance of the Terms of Use.