This Article 
 Bibliographic References 
 Add to: 
Combinatorial Optimization of Distributed Queries
December 1995 (vol. 7 no. 6)
pp. 915-927

Abstract—In relational distributed databases a query cost consists of a local cost and a transmission cost. Query optimization is a combinatorial optimization problem. As the query size grows, the optimization methods based on exhaustive search become too expensive. We propose the following strategy for solving large distributed query optimization problems in relational database systems: 1) represent each query-processing schedule by a labeled directed graph, 2) reduce the number of different schedules by pruning away invalid or high-cost solutions, and 3) find a suboptimal schedule by combinatorial optimization. We investigate several combinatorial optimization techniques: random search, single start, multistart, simulated annealing, and a combination of random search and local simulated annealing. The utility of combinatorial optimization is demonstrated in the problem of finding the (sub)optimal semijoin schedule that fully reduces all relations of a tree query. The combination of random search and local simulated annealing was superior to other tested methods.

[1] P. Bernstein and D. Chiu,“Using semijoins to solve relational queries,” J. ACM, vol. 28, pp. 25-40, 1981.
[2] C.G.E. Boender,A.H.G. Rinnooy Kan,, and C. Vercellis,“Stochastic optimization methods,” Stochastics in Combinatorial Optimization: Proc. of the Advanced School, G. Andreatta, F. Mason, P. Serafini, eds., pp. 94-112, CISM, Udine, Italy, Sept.22-25, 1986.
[3] U. Derigs,“Using confidence limits for the global optimum in combinatorial optimization,” Oper. Res., vol. 33, no. 5, pp. 1,024-1,049, 1985.
[4] L. Devroye, Private communication.
[5] B.L. Golden and F.B. Alt,“Interval estimation of a global optimum for large combinatorial problems,” Nav. Res. Log. Quart., vol. 26, pp. 69-77, 1979.
[6] L. Ingber,“Simulated annealing: Practice versus theory,” J. Math. Comput. Modeling, vol. 18, no. 11, pp. 29-57, 1993.
[7] Y.E. Ioannidis and E. Wong,“Query optimization by simulated annealing,” Proc. ACM-SIGMOD Conf., pp. 9-22, 1987.
[8] Y.E. Ioannidis and Y.C. Kang,“Randomized algorithms for optimizing large join queries,” Proc. ACM-SIGMOD Conf., vol. 19, pp. 312-321, 1990.
[9] Y.E. Ioannidis and Y.C. Kang,“Left-deep vs. bushy trees: An analysis of strategy spaces and its implication for query optimization,” Proc. ACM-SIGMOD Conf., vol. 20, pp. 168-177, 1991.
[10] R.A. Jarvis,“Optimization strategies in adaptive control: A selective survey,” IEEE Trans. Systems, Man, and Cybernetics, vol. 5, pp. 83-94, 1975.
[11] D.C. Karnopp,“Random search techniques for optimization problems,” Automatica, vol. 1, pp. 111-121, 1963.
[12] D. Knuth, The Art of Computer Programming, Vol. 2, Addison-Wesley, Reading, Mass., 1998.
[13] R.S.G. Lanzelotte,P. Valduriez,, and M. Zaït,“On the effectiveness of optimization search strategies for parallel execution spaces,” Proc. 19th Int’l Conf. Very Large Databases, pp. 493-504,Dublin, 1993.
[14] J.B. Lasserre,P.P. Varaiya,, and J. Walrand,“Simulated annealing, random search, multistart or SAD?,” Systems&Control Letters, vol. 8, pp. 297-301, 1987.
[15] W.S. Luk and P.A. Black,“On cost estimation in processing a query in a distributed database system,” Proc. IEEE Fifth Int’l Computer Software Applications Conf., pp. 24-32, 1981.
[16] L.F. Mackert and G.M. Lohman,“R* optimizer validation and performance evaluation for distributed queries,” Proc. 12th Int’l Conf. Very Large Databases, pp. 149-159,Kyoto, 1986.
[17] A. Nayeem,J. Vila,, and H.A. Scheraga,“A comparative study of the simulated-annealing and MonteCarlo-with-minimization approaches to the minimum-energy structures ofpolypeptides: [Met]-Enkephalin,” J. Computational Chemistry, vol. 12, no. 5, pp. 594-605, 1991.
[18] S. Pramanik and D. Vineyard,“Optimizing join queries in distributed databases,” IEEE Trans. Software Engineering, vol. 14, pp. 1,319-1,326, 1988.
[19] J. Sabbagh,“A comprehensive framework for multiple-query optimization,” PhD dissertation, Univ. of Southwestern Louisiana, Lafayette, La., 1991.
[20] T.K. Sellis,“Multiple query optimization,” ACM Trans. Database Systems, vol. 13, pp. 23-52, 1988.
[21] F. Schoen,“Stochastic techniques for global optimization: A survey of recent advances,” J. Global Optimization, vol. 1, pp. 207-228, 1991.
[22] A. Swami and A. Gupta,“Optimization of large join queries,” Proc. ACM-SIGMOD Conf., pp. 8-17, 1988.
[23] R. Tarjan,“Depth-first search and linear graph algorithms,” SIAM J. Computing, vol. 1, pp. 146-160, 1972.
[24] C.T. Yu and C.C. Chang,“Distributed query processing,” ACM Computing Surveys, vol. 16, pp. 399-433, 1984.
[25] C.T. Yu,Z.M. Ozsoyoglu,, and K. Lam, “Optimization of distributed tree queries,” J. Computer and Systems Science, vol. 29, pp. 409-445, 1984.

Index Terms:
Combinatorial optimization, distributed query processing, multistart, random search, relational database, semijoin, simulated annealing. single start, tree query.
Bojan Groselj, Qutaibah M. Malluhi, "Combinatorial Optimization of Distributed Queries," IEEE Transactions on Knowledge and Data Engineering, vol. 7, no. 6, pp. 915-927, Dec. 1995, doi:10.1109/69.476497
Usage of this product signifies your acceptance of the Terms of Use.