This Article 
 Bibliographic References 
 Add to: 
Optimizing Large Join Queries Using A Graph-Based Approach
March/April 2001 (vol. 13 no. 2)
pp. 298-315

Abstract—Although many query tree optimization strategies have been proposed in the literature, there still is a lack of a formal and complete representation of all possible permutations of query operations (i.e., execution plans) in a uniform manner. A graph-theoretic approach presented in this paper provides a sound mathematical basis for representing a query and searching for an execution plan. In this graph model, a node represents an operation and a directed edge between two nodes indicates the order of executing these two operations in an execution plan. Each node is associated with a weight and so is an edge. The weight is an expression containing optimization required parameters, such as relation size, tuple size, join selectivity factors. All possible execution plans are representable in this graph and each spanning tree of the graph becomes an execution plan. It is a general model which can be used in the optimizer of a DBMS for interal query representation. On the basis of this model, we devise an algorithm that finds a near optimal execution plan using only polynomial time. The algorithm is compared with a few other popular optimization methods. Experiments show that the proposed algorithm is superior to the others under most circumstances.

[1] M.-S. Chen, M.-L. Lo, P. S. Yu, and H. C. Young,“Using segmented right-deep trees for the execution of pipelined hash joins,”inProc. 18th Int. Conf. Very Large Databases, Aug. 1992, pp. 15–26.
[2] G. Chartrand and O.R. Oellermann, Applied and Algorithmic Graph Theory. McGraw-Hill, 1993.
[3] D.J. DeWitt, R.H. Katz, F. Olken, L.D. Shapiro, and M.R. Stonebraker, “Implementation Techniques for Main Memory Database Systems,” Proc. ACM SIGMOD, 1984.
[4] W.-C. Hou, G. Ozsoyoglu, and E. Dogdu, "Error-Constrained Count Query Evaluation in Relational Databases," Proc. ACM SIGMOD Conf., pp. 278-287, Aug. 1991.
[5] K.A. Hua and C. Lee, ”An Adaptive Data Placement Scheme for Parallel Database Computer Systems,” Proc. Int'l Conf. Very Large Data Bases, 1990.
[6] K. Hua and C. Lee,“Handling data skew in multiprocessor database computers using partition tuning,”inProc. 17th Int. Conf. Very Large Databases, Barcelona, Spain, Sept. 1991, pp. 525–535.
[7] K.A. Hua, C. Lee, and J.K. Peir, "Interconnecting Shared-Everything Systems for Efficient Parallel Query Processing," Proc. 18th Int'l Conf. Very Large Data Bases, pp. 262-270, 1992.
[8] K.A. Hua, C. Lee, and H.C. Young, ”A Cell-Based Data Partitioning Strategy for Efficient Load Balancing in a Distributed Memory Multicomputer Database System,” Proc. IEEE Int'l Conf. Parallel Processing, Aug. 1991.
[9] K.A. Hua, C. Lee, and H. Young, ”Data Partitioning for Multicomputer Database Systems: A Cell-Based Approach,” Information Systems, vol. 18, no. 5, pp. 329-342, Sept. 1993.
[10] K.A. Hua, C. Lee, and C.M. Hua, "Dynamic Load Balancing in Multicomputer Database Systems Using Partition Tuning," IEEE Trans. Knowledge and Data Eng., vol. 7, no. 6, pp. 968-983, Dec. 1995.
[11] Y.E. Ioannidis and Y.C. Kang,“Randomized algorithms for optimizing large join queries,” Proc. ACM-SIGMOD Conf., vol. 19, pp. 312-321, 1990.
[12] Y.E. Ioannidis and E. Wong,“Query optimization by simulated annealing,” Proc. ACM-SIGMOD Conf., pp. 9-22, 1987.
[13] S. Kirkpatrick, C.D. Gelatt, and M.P. Vecchi, ”Optimization by Simulating Annealing,” Science, vol. 220, no. 4598, pp. 671-680, May 1983.
[14] K.A. Hua,Y.-L. Lo,, and H.C. Young,“Including the load balancing issue in the optimization of multi-way join queries for shared-nothing database computers,” Proc. Second Conf. Parallel and Distributed Information Systems, pp. 74-83, Jan. 1993.
[15] C. Lee and Z.-A. Chang, "Utilizing Page-Level Join Index for Optimization in Parallel Join Execution," IEEE Trans. Knowledge and Data Eng., vol. 7, no. 6, Dec. 1995.
[16] C. Lee and K. Hua, “A Self-Adjusting Data Distribution Mechanism for Multidimensional Load Balancing in Multiprocessor-Based Database Systems,” Information Systems, vol. 19, no. 7, pp. 549-567, 1994.
[17] H. Lam, C. Lee, and S.Y.W. Su, ”A Special Function Unit for Database Operations (SFU-DB): Design and Performance Evaluation,” IEEE Trans. Computers, vol. 40, no. 3, Mar. 1991.
[18] Query Processing in Parallel Relational Database Systems, H. Lu, B.C. Ooi, and K.L. Tan, eds. IEEE CS Press: 1994.
[19] C. Lee and C.-S. Shih, ”Optimizing Large Join Queries Using a Graph-Theoretic Approach,” Technical Report TR 84-6-01, Database Laboratories, Inst. of Information Eng., Nat'l Cheng Kung Univ., Tainan, Taiwan, June 1995.
[20] H. Lu, M.-C. Shan, and K.-L. Tan,“Optimization of multi-way join queries for parallel execution,”inProc. 17th Int. Conf. Very Large Databases, Barcelona, Spain, Sept. 1991, pp. 549–560.
[21] H. Lu, K.-L. Tan, and C. Lee, ”Load Balancing in Pipelined Processing of Multi-Join Queries,” Proc. IEEE Int'l Conf. Parallel and Distributed Systems, 1994.
[22] H. Lu,K. Tan,, and M. Shan,“Hash-based join algorithms for multiprocessor computers with shared memory,” Proc. 16th Int’l Conf. Very Large Data Bases, pp. 198-208, 1990.
[23] R.S.G. Lanzelotte,P. Valduriez,, and M. Zaït,“On the effectiveness of optimization search strategies for parallel execution spaces,” Proc. 19th Int’l Conf. Very Large Databases, pp. 493-504,Dublin, 1993.
[24] U. Manber, Introduction to Algorithms: A Creative Approach. Addison-Wesley, 1989.
[25] D. Schneider and D. J. DeWitt,“Tradeoffs in processing complex join queries via hashing in multiprocessor database machines,”inProc. 16th Int. Conf. Very Large Databases, Brisbane, Australia, Aug. 1990, pp. 469–480
[26] P. Selinger,D. Astrahan,D. Chamberlin,R. Lorie,, and T. Price,“Access path selection in a relational database management system,” Proc. 1979 ACM-SIGMOD Int’l Conf. Management of Data, pp. 23-34,Boston, May 1979.
[27] A. Swami and A. Gupta,“Optimization of large join queries,” Proc. ACM-SIGMOD Conf., pp. 8-17, 1988.
[28] A. Swami,“Optimization of large join queries: Combining heuristics with combinatorial techniques,”inProc. ACM SIGMOD, Chicago, IL, June 1989, pp. 367–376.
[29] E. Shekita, H. C. Young, and K. Tan,“Multijoin optimization for symmetric multiprocessors,”inProc. 19th Int. Conf. Very Large Databases, Aug. 1993, pp. 479–492.
[30] M. Ziane,M. Zait,, and P. Borla-Salamet,“Parallel query processing in DBS,” Proc. Second Conf. Parallel and Distributed Information Systems, pp. 93-102, Jan. 1993.

Index Terms:
Large join query, graph theory, join cost, query tree, query optimization.
Chiang Lee, Chi-Sheng Shih, Yaw-Huei Chen, "Optimizing Large Join Queries Using A Graph-Based Approach," IEEE Transactions on Knowledge and Data Engineering, vol. 13, no. 2, pp. 298-315, March-April 2001, doi:10.1109/69.917567
Usage of this product signifies your acceptance of the Terms of Use.