This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Optimization of Parallel Execution for Multi-Join Queries
June 1996 (vol. 8 no. 3)
pp. 416-428

Abstract—In this paper, we study the subject of exploiting interoperator parallelism to optimize the execution of multi-join queries. Specifically, we focus on two major issues: 1) scheduling the execution sequence of multiple joins within a query, and 2) determining the number of processors to be allocated for the execution of each join operation obtained in 1). For the first issue, we propose and evaluate by simulation several methods to determine the general join sequences, or bushy trees. Despite their simplicity, the heuristics proposed can lead to the general join sequences that significantly outperform the optimal sequential join sequence. The quality of the join sequences obtained by the proposed heuristics is shown to be fairly close to that of the optimal one. For the second issue, it is shown that the processor allocation for exploiting interoperator parallelism is subject to more constraints—such as execution dependency and system fragmentation—than those in the study of intraoperator parallelism for a single join. The concept of synchronous execution time is proposed to alleviate these constraints. Several heuristics to deal with the processor allocation, categorized by bottom-up and top-down approaches, are derived and are evaluated by simulation. The relationship between issues 1) and 2) is explored. Among all the schemes evaluated, the two-step approach proposed, which first applies the join sequence heuristic to build a bushy tree as if under a single processor system, and then, in light of the concept of synchronous execution time, allocates processors to execute each join in the bushy tree in a top-down manner, emerges as the best solution to minimize the query execution time.

[1] P. America, "Parallel Database Systems," Proc. PRISMA Workshop, LNCS 503, Springer-Verlag, 1991.
[2] C. Baru et al., "An Overview of DB2 Parallel Edition," Proc. ACM SIGMOD, pp. 460-462, May 1995.
[3] C.K. Baru and O. Frieder,“Database operations in a cube-connected multiprocessor system,” IEEE Trans. on Computers, vol. 38, no. 6, pp. 920-927, June 1989.
[4] B. Bergsten, M. Couprie, and M. Lopez, "DBS3: A Parallel Database System for Shared Store," Proc. Second Int'l Conf. Parallel and Distributed Information Systems, pp. 260-262, Jan. 1993.
[5] A. Bhide and M. Stonebraker, "A Performance Comparison of Two Architectures for Fast Transaction Processing," Proc. Fourth Int'l Conf. Data Eng., pp. 536-545, Feb. 1988.
[6] H. Boral,W. Alexander,L. Clay,G. Copeland,S. Danforth,M. Franklin,B. Hart,M. Smith,, and P. Valduriez,“Prototyping Bubba, a highly parallel database system,” IEEE Trans. on Knowledge and Data Engineering, vol. 2, no. 1, pp. 4-24, Mar. 1990.
[7] K. Bratbergsengen and T. Gjelsvik, "The Development of the CROSS8 and HC16-186 (Database) Computers," Proc. Sixth Int'l Workshop Database Machines, June 1989.
[8] L. Chambers and D. Cracknell, "Parallel Features of NonStop SQL," Proc. Second Int'l Conf. Parallel and Distributed Information Systems, pp. 69-70, Jan. 1993.
[9] M.S. Chen, M.L. Lo, P.S. Yu, and H.E. Young, “Applying Segmented Right-Deep Trees to Pipelining Multiple Hash Joins,” IEEE Trans. Knowledge and Data Eng., vol. 7, no. 4, Aug. 1995.
[10] M.-S. Chen and P.S. Yu, “Combining Join and Semijoin Operations for Distributed Query Processing,” IEEE Trans. Knowledge and Data Eng., vol. 5, no. 3, pp. 534-542, June 1993.
[11] M.-S. Chen,P.S. Yu,, and K.-L. Wu,“Scheduling and processor allocation for parallel execution of multi-join queries,” Proc. Eighth Int’l Conf. Data Engineering, pp. 58-67, Feb. 1992.
[12] D. Clay, "Informix Parallel Data Query," Proc. Second Int'l Conf. Parallel and Distributed Information Systems, pp. 71-72, Jan. 1993.
[13] S.M. Deen, D.N.P. Kannangara, and M.C. Taylor, "Multi-Join Parallel Processors," Proc. Second Int'l Symp. Databases in Parallel and Distributed Systems, pp. 92-102, July 1990.
[14] D. DeWitt, J. Naughton, D. Schneider, and S. Seshadri,“Practical skew handling in parallel joins,”inProc. 18th Int. Conf. Very Large Databases, Vancouver, B.C., Aug. 1992, pp. 27–40.
[15] D.J. DeWitt and R. Gerber, "Multiprocessor Hash-Based Join Algorithms," Proc. 11th Int'l Conf. Very Large Data Bases, pp. 151-162, Aug. 1985.
[16] D.J. DeWitt,S. Ghandeharizadeh,D.A. Schneider,A. Bricker,H.I. Hsiao,, and R. Rasmussen,“The gamma database machine project,” IEEE Trans. on Knowledge and Data Engineering, vol. 2, no. 1, pp. 44-62, Mar. 1990.
[17] D. DeWitt and J. Gray, “Parallel Database Systems: The Future of High-Performance Database Systems,” Comm. ACM, Vol. 35, No. 6, June 1992, pp. 85-98.
[18] S. Ganguly, W. Hasan, and R. Krishnamurthy,“Query optimization for parallel execution,”inProc. ACM SIGMOD, June 1992, pp. 9–18.
[19] Gardarin et al., "Design of a Multiprocessor Relational Database System," Proc. Information Processing 83, pp. 363-367, 1983.
[20] D. Gardy and C. Puech, “On the Effect of Join Operations on Relation Sizes,” ACM Trans. Database Systems, vol. 14, no. 4, pp. 574-603, Dec. 1989.
[21] R. Gerber, "Dataflow Query Processing Using Multiprocessor Hash-Partitioned Algorithms," Technical Report No. 672, Computer Science Dept., Univ. of Wisconsin-Madison, Oct. 1986.
[22] G. Graefe, "Rule-Based Query Optimization in Extensible Database Systems," Technical Report Technical Report No. 724, Computer Science Dept., Univ. of Wisconsin-Madison, Nov. 1987.
[23] P. Haas and A. Swami, “Sequential Sampling Procedures for Query Size Estimation,” Proc. ACM SIGMOD, pp. 341-350, June 1992.
[24] D.J. Haderle and R.D. Jackson, "IBM Database 2 Overview," IBM Systems J., vol. 23, no. 2, pp. 112-125, 1984.
[25] W. Hong,“Exploiting interoperator parallelism in XPRS,”inProc. ACM SIGMOD, San Diego, CA, June 1992, pp. 19–28.
[26] H.-I. Hsiao, M.-S. Chen, and P. S. Yu,“On parallel execution of multiple pipelined hash joins,”inProc. ACM SIGMOD, Minneapolis, MN, May 1994, pp. 185–196.
[27] K.A. Hua,Y.L. Lo,, and H. Young,“Considering data skew factor in multiway join query optimization for parallel execution,” Very Low Data Base J. vol. 2, no. 3, pp. 303-330, July 1993.
[28] B. Iyer and D.M. Dias, "System Issues in Parallel Sorting for Database Systems," Proc. Sixth Int'l Conf. Data Eng., pp. 246-255, 1990.
[29] B. Iyer, G. Ricard, and P. Varman,“Percentile finding algorithm for multiple sorted runs,”inProc. 15th Int. Conf. Very Large Databases, Amsterdam, The Netherlands, Aug. 1989, pp. 135–144.
[30] M. Jarke and J. Koch, “Query Optimization in Database Systems,” ACM Computer Surveys, vol. 16, pp. 111–152, 1984.
[31] M. Kitsuregawa, H. Tanaka, and T. Moto-Oka, "Architecture and Performance of Relational Algebra Machine GRACE," Proc. Int'l Conf. Parallel Processing, pp. 241-250, Aug. 1984.
[32] M. Seetha and P. Yu, “Effectiveness of Parallel Joins,” IEEE Trans. Knowledge and Data Eng., vol. 2, no. 4, pp. 410-424, Aug. 1990.
[33] R.S.G. Lanzelotte,P. Valduriez,, and M. Zaït,“On the effectiveness of optimization search strategies for parallel execution spaces,” Proc. 19th Int’l Conf. Very Large Databases, pp. 493-504,Dublin, 1993.
[34] B. Linder, "Oracle Parallel RDBMS on Massively Parallel Systems," Proc. Second Int'l Conf. Parallel and Distributed Information Systems, pp. 67-68, Jan. 1993.
[35] M.-L. Lo, M.-S. Chen, C. V. Ravishankar, and P. S. Yu,“On optimal processor allocation to support pipelined hash joins,”inProc. ACM SIGMOD, May 1993, pp. 69–78.
[36] R.A. Lorie,J.-J. Daudenarde,J.W. Stamos,, and H.C. Young,“Exploiting database parallelism in a message-passing multiprocessor,” IBM J. of Research and Development, vol. 35, nos. 5/6, pp. 681-695, Sept./Nov. 1991.
[37] H. Lu, M.-C. Shan, and K.-L. Tan,“Optimization of multi-way join queries for parallel execution,”inProc. 17th Int. Conf. Very Large Databases, Barcelona, Spain, Sept. 1991, pp. 549–560.
[38] H. Lu,K.L. Tan,, and M.-C. Shan,“Hash-based join algorithms for multiprocessor computers with shared memory,” Proc. 16th Int’l Conf. Very Large Data Bases, pp. 198-209, Aug. 1990.
[39] P. Mishra and M.H. Eich, "Join Processing in Relational Databases," ACM Computing Surveys, vol. 24, no. 1, pp. 64-113, Mar. 1992.
[40] E.R. Omiecinski and E.T. Lin,“Hash-based and index-based join algorithms for cube and ring connected multicomputers,” IEEE Trans. on Knowledge and Data Engineering, vol. 1, no. 3, pp. 329-343, Sept. 1989.
[41] K. Ono and G. Lohman, "Measuring the Complexity of Join Enumeration in Query Optimization," Proc. 16th Int'l Conf. Very Large Data Bases, pp. 314-325, Aug. 1990.
[42] H. Pirahesh,C. Mohan,J. Cheng,T.S. Liu,, and P. Selinger,“Parallelism in relational data base systems: Architectural issues and design approaches,” Proc. Second Int’l Symp. Databases in Parallel and Distributed Systems, pp. 4-29, July 1990.
[43] D. Reiner, "The Kendall Square Query Decomposer," Proc. Second Int'l Conf. Parallel and Distributed Information Systems, pp. 36-37, Jan. 1993.
[44] J. Richardson,H. Lu,, and K. Mikkilineni,“Design and evaluation of parallel pipelined join algorithms,” Proc. ACM SIGMOD, pp. 399-409, May 1987.
[45] D. Schneider and D. DeWitt, “A Performance Evaluation of Four Parallel Join Algorithms in a Shared-Nothing Multiprocessor Environment,” ACM SIGMOD Record, vol. 18, no. 2, pp. 110-121, June 1989.
[46] D. Schneider and D. J. DeWitt,“Tradeoffs in processing complex join queries via hashing in multiprocessor database machines,”inProc. 16th Int. Conf. Very Large Databases, Brisbane, Australia, Aug. 1990, pp. 469–480
[47] P. Selinger,D. Astrahan,D. Chamberlin,R. Lorie,, and T. Price,“Access path selection in a relational database management system,” Proc. 1979 ACM-SIGMOD Int’l Conf. Management of Data, pp. 23-34,Boston, May 1979.
[48] T.K. Sellis,“Multiple query optimization,” ACM Trans. Database Systems, vol. 13, pp. 23-52, 1988.
[49] M. Stonebraker,R. Katz,D. Patterson,, and J. Ousterhout,“The design of XPRS,” Proc. 14th Int’l Conf. Very Large Data Bases, pp. 318-330, 1988.
[50] "DBC/1012 Database Computer System Manual Release 2.0," Technical Report Document No. C10-0001-02, Teradata Corp., Nov. 1985.
[51] J.L. Wolf, D.M. Dias, and P.S. Yu, "A Parallel Sort Merge Join Algorithm for Managing Data Skew," IEEE Trans. Parallel and Distributed Systems, vol. 4, no. 1, pp. 70-86, Jan. 1993.
[52] J.L. Wolf, J. Turek, M.-S. Chen, and P.S. Yu, "A Hierarchical Approach to Parallel Multiquery Scheduling," IEEE Trans. Parallel and Distributed Systems, vol. 6, no. 6, pp. 578-590, June 1995.
[53] P.S. Yu,M.-S. Chen,H. Heiss,, and S.H. Lee,“On workload characterization of relational database environments,” IEEE Trans on Software Engineering, vol. 18, no. 4, pp. 347-355, Apr. 1992.
[54] P. S. Yu, M.-S. Chen, J. L. Wolf, and J. J. Turek,“Parallel query processing,”inAdvanced Database Systems, N. Adam and B. Bhargavach, Eds. New York: Springer-Verlag, Dec. 1993, ch. 12, pp. 239–258, Lecture Notes in Computer Science 759.
[55] M. Ziane,M. Zait,, and P. Borla-Salamet,“Parallel query processing in DBS,” Proc. Second Conf. Parallel and Distributed Information Systems, pp. 93-102, Jan. 1993.

Index Terms:
Bushy trees, synchronous execution time, multi-join query, execution dependency, system fragmentation.
Citation:
Ming-Syan Chen, Philip S. Yu, Kun-Lung Wu, "Optimization of Parallel Execution for Multi-Join Queries," IEEE Transactions on Knowledge and Data Engineering, vol. 8, no. 3, pp. 416-428, June 1996, doi:10.1109/69.506709
Usage of this product signifies your acceptance of the Terms of Use.