
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
Chiang Lee, ChiSheng Shih, YawHuei Chen, "Optimizing Large Join Queries Using A GraphBased Approach," IEEE Transactions on Knowledge and Data Engineering, vol. 13, no. 2, pp. 298315, March/April, 2001.  
BibTex  x  
@article{ 10.1109/69.917567, author = {Chiang Lee and ChiSheng Shih and YawHuei Chen}, title = {Optimizing Large Join Queries Using A GraphBased Approach}, journal ={IEEE Transactions on Knowledge and Data Engineering}, volume = {13}, number = {2}, issn = {10414347}, year = {2001}, pages = {298315}, doi = {http://doi.ieeecomputersociety.org/10.1109/69.917567}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Knowledge and Data Engineering TI  Optimizing Large Join Queries Using A GraphBased Approach IS  2 SN  10414347 SP298 EP315 EPD  298315 A1  Chiang Lee, A1  ChiSheng Shih, A1  YawHuei Chen, PY  2001 KW  Large join query KW  graph theory KW  join cost KW  query tree KW  query optimization. VL  13 JA  IEEE Transactions on Knowledge and Data Engineering ER   
Abstract—Although many query tree optimization strategies have been proposed in the literature, there still is a lack of a formal and complete representation of all possible permutations of query operations (i.e., execution plans) in a uniform manner. A graphtheoretic approach presented in this paper provides a sound mathematical basis for representing a query and searching for an execution plan. In this graph model, a node represents an operation and a directed edge between two nodes indicates the order of executing these two operations in an execution plan. Each node is associated with a weight and so is an edge. The weight is an expression containing optimization required parameters, such as relation size, tuple size, join selectivity factors. All possible execution plans are representable in this graph and each spanning tree of the graph becomes an execution plan. It is a general model which can be used in the optimizer of a DBMS for interal query representation. On the basis of this model, we devise an algorithm that finds a near optimal execution plan using only polynomial time. The algorithm is compared with a few other popular optimization methods. Experiments show that the proposed algorithm is superior to the others under most circumstances.
[1] M.S. Chen, M.L. Lo, P. S. Yu, and H. C. Young,“Using segmented rightdeep trees for the execution of pipelined hash joins,”inProc. 18th Int. Conf. Very Large Databases, Aug. 1992, pp. 15–26.
[2] G. Chartrand and O.R. Oellermann, Applied and Algorithmic Graph Theory. McGrawHill, 1993.
[3] D.J. DeWitt, R.H. Katz, F. Olken, L.D. Shapiro, and M.R. Stonebraker, “Implementation Techniques for Main Memory Database Systems,” Proc. ACM SIGMOD, 1984.
[4] W.C. Hou, G. Ozsoyoglu, and E. Dogdu, "ErrorConstrained Count Query Evaluation in Relational Databases," Proc. ACM SIGMOD Conf., pp. 278287, Aug. 1991.
[5] K.A. Hua and C. Lee, ”An Adaptive Data Placement Scheme for Parallel Database Computer Systems,” Proc. Int'l Conf. Very Large Data Bases, 1990.
[6] K. Hua and C. Lee,“Handling data skew in multiprocessor database computers using partition tuning,”inProc. 17th Int. Conf. Very Large Databases, Barcelona, Spain, Sept. 1991, pp. 525–535.
[7] K.A. Hua, C. Lee, and J.K. Peir, "Interconnecting SharedEverything Systems for Efficient Parallel Query Processing," Proc. 18th Int'l Conf. Very Large Data Bases, pp. 262270, 1992.
[8] K.A. Hua, C. Lee, and H.C. Young, ”A CellBased Data Partitioning Strategy for Efficient Load Balancing in a Distributed Memory Multicomputer Database System,” Proc. IEEE Int'l Conf. Parallel Processing, Aug. 1991.
[9] K.A. Hua, C. Lee, and H. Young, ”Data Partitioning for Multicomputer Database Systems: A CellBased Approach,” Information Systems, vol. 18, no. 5, pp. 329342, Sept. 1993.
[10] K.A. Hua, C. Lee, and C.M. Hua, "Dynamic Load Balancing in Multicomputer Database Systems Using Partition Tuning," IEEE Trans. Knowledge and Data Eng., vol. 7, no. 6, pp. 968983, Dec. 1995.
[11] Y.E. Ioannidis and Y.C. Kang,“Randomized algorithms for optimizing large join queries,” Proc. ACMSIGMOD Conf., vol. 19, pp. 312321, 1990.
[12] Y.E. Ioannidis and E. Wong,“Query optimization by simulated annealing,” Proc. ACMSIGMOD Conf., pp. 922, 1987.
[13] S. Kirkpatrick, C.D. Gelatt, and M.P. Vecchi, ”Optimization by Simulating Annealing,” Science, vol. 220, no. 4598, pp. 671680, May 1983.
[14] K.A. Hua,Y.L. Lo,, and H.C. Young,“Including the load balancing issue in the optimization of multiway join queries for sharednothing database computers,” Proc. Second Conf. Parallel and Distributed Information Systems, pp. 7483, Jan. 1993.
[15] C. Lee and Z.A. Chang, "Utilizing PageLevel Join Index for Optimization in Parallel Join Execution," IEEE Trans. Knowledge and Data Eng., vol. 7, no. 6, Dec. 1995.
[16] C. Lee and K. Hua, “A SelfAdjusting Data Distribution Mechanism for Multidimensional Load Balancing in MultiprocessorBased Database Systems,” Information Systems, vol. 19, no. 7, pp. 549567, 1994.
[17] H. Lam, C. Lee, and S.Y.W. Su, ”A Special Function Unit for Database Operations (SFUDB): Design and Performance Evaluation,” IEEE Trans. Computers, vol. 40, no. 3, Mar. 1991.
[18] Query Processing in Parallel Relational Database Systems, H. Lu, B.C. Ooi, and K.L. Tan, eds. IEEE CS Press: 1994.
[19] C. Lee and C.S. Shih, ”Optimizing Large Join Queries Using a GraphTheoretic Approach,” Technical Report TR 84601, Database Laboratories, Inst. of Information Eng., Nat'l Cheng Kung Univ., Tainan, Taiwan, June 1995.
[20] H. Lu, M.C. Shan, and K.L. Tan,“Optimization of multiway join queries for parallel execution,”inProc. 17th Int. Conf. Very Large Databases, Barcelona, Spain, Sept. 1991, pp. 549–560.
[21] H. Lu, K.L. Tan, and C. Lee, ”Load Balancing in Pipelined Processing of MultiJoin Queries,” Proc. IEEE Int'l Conf. Parallel and Distributed Systems, 1994.
[22] H. Lu,K. Tan,, and M. Shan,“Hashbased join algorithms for multiprocessor computers with shared memory,” Proc. 16th Int’l Conf. Very Large Data Bases, pp. 198208, 1990.
[23] R.S.G. Lanzelotte,P. Valduriez,, and M. Zaït,“On the effectiveness of optimization search strategies for parallel execution spaces,” Proc. 19th Int’l Conf. Very Large Databases, pp. 493504,Dublin, 1993.
[24] U. Manber, Introduction to Algorithms: A Creative Approach. AddisonWesley, 1989.
[25] D. Schneider and D. J. DeWitt,“Tradeoffs in processing complex join queries via hashing in multiprocessor database machines,”inProc. 16th Int. Conf. Very Large Databases, Brisbane, Australia, Aug. 1990, pp. 469–480
[26] P. Selinger,D. Astrahan,D. Chamberlin,R. Lorie,, and T. Price,“Access path selection in a relational database management system,” Proc. 1979 ACMSIGMOD Int’l Conf. Management of Data, pp. 2334,Boston, May 1979.
[27] A. Swami and A. Gupta,“Optimization of large join queries,” Proc. ACMSIGMOD Conf., pp. 817, 1988.
[28] A. Swami,“Optimization of large join queries: Combining heuristics with combinatorial techniques,”inProc. ACM SIGMOD, Chicago, IL, June 1989, pp. 367–376.
[29] E. Shekita, H. C. Young, and K. Tan,“Multijoin optimization for symmetric multiprocessors,”inProc. 19th Int. Conf. Very Large Databases, Aug. 1993, pp. 479–492.
[30] M. Ziane,M. Zait,, and P. BorlaSalamet,“Parallel query processing in DBS,” Proc. Second Conf. Parallel and Distributed Information Systems, pp. 93102, Jan. 1993.