This Article 
 Bibliographic References 
 Add to: 
Applying Segmented Right-Deep Trees to Pipelining Multiple Hash Joins
August 1995 (vol. 7 no. 4)
pp. 656-668

Abstract—The pipelined execution of multijoin queries in a multiprocessor-based database system is explored in this paper. Using hash-based joins, multiple joins can be pipelined so that the early results from a join, before the whole join is completed, are sent to the next join for processing. The execution of a query is usually denoted by a query execution tree. To improve the execution of pipelined hash joins, an innovative approach on query execution tree selection is proposed to exploit segmented right-deep trees, which are bushy trees of right-deep subtrees. We first derive an analytical model for the execution of a pipeline segment, and then, in light of the model, develop heuristic schemes to determine the query execution plan based on a segmented right-deep tree so that the query can be efficiently executed. As shown by our simulation, the proposed approach, without incurring additional overhead on plan execution, possesses more flexibility in query plan generation, and can lead to query plans of better performance than those achievable by the previous schemes using right-deep trees.

[1] C.K. Baru and O. Frieder,“Database operations in a cube-connected multiprocessor system,” IEEE Trans. on Computers, vol. 38, no. 6, pp. 920-927, June 1989.
[2] H. Boral,W. Alexander,L. Clay,G. Copeland,S. Danforth,M. Franklin,B. Hart,M. Smith,, and P. Valduriez,“Prototyping Bubba, a highly parallel database system,” IEEE Trans. on Knowledge and Data Engineering, vol. 2, no. 1, pp. 4-24, Mar. 1990.
[3] M.-S. Chen and P.S. Yu, “Interleaving a Join Sequence with Semijoins in Distributed Query Processing,” IEEE Trans. Parallel and Distributed Systems, vol. 3, no. 5, pp. 611-621, Sept. 1992.
[4] M.-S. Chen,P.S. Yu,, and K.-L. Wu,“Scheduling and processor allocation for parallel execution of multi-join queries,” Proc. Eighth Int’l Conf. Data Engineering, pp. 58-67, Feb. 1992.
[5] Intel Corporation, iPSC/2 User’s Guide. Intel Corporation, Mar. 1988.
[6] D.J. DeWitt and R. Gerber,“Multiprocessor hash-based join algorithms,” Proc. 11th Int’l Conf. Very Large Data Bases, pp. 151-162, Aug. 1985.
[7] D.J. DeWitt,S. Ghandeharizadeh,D.A. Schneider,A. Bricker,H.I. Hsiao,, and R. Rasmussen,“The gamma database machine project,” IEEE Trans. on Knowledge and Data Engineering, vol. 2, no. 1, pp. 44-62, Mar. 1990.
[8] D. DeWitt and J. Gray, “Parallel Database Systems: The Future of High-Performance Database Systems,” Comm. ACM, Vol. 35, No. 6, June 1992, pp. 85-98.
[9] O. Frieder,“Multiprocessor algorithms for relational-database operations on hypercube systems,” Computer, pp. 13-28, Nov. 1990.
[10] S. Ganguly, W. Hasan, and R. Krishnamurthy,“Query optimization for parallel execution,”inProc. ACM SIGMOD, June 1992, pp. 9–18.
[11] D. Gardy and C. Puech, “On the Effect of Join Operations on Relation Sizes,” ACM Trans. Database Systems, vol. 14, no. 4, pp. 574-603, Dec. 1989.
[12] R. Gerber,“Dataflow query processing using multiprocessor hash-partitioned algorithms,” Tech. Report 672, Computer Science Department, Univ. of Wisconsin, Madison, Oct. 1986.
[13] G. Graefe,“Rule-based query optimization in extensible database systems,” Tech. Report 724, Computer Science Dept., Univ. of Wisconsin, Madison, Nov. 1987.
[14] W. Hong,“Exploiting interoperator parallelism in XPRS,”inProc. ACM SIGMOD, San Diego, CA, June 1992, pp. 19–28.
[15] W. Hong and M. Stonebraker,“Optimization of parallel query execution plans in XPRS,” Proc. First Conf. Parallel and Distributed Information Systems, pp. 218-225, Dec. 1991.
[16] H.-I. Hsiao, M.-S. Chen, and P. S. Yu,“On parallel execution of multiple pipelined hash joins,”inProc. ACM SIGMOD, Minneapolis, MN, May 1994, pp. 185–196.
[17] K. Hua and C. Lee,“Handling data skew in multiprocessor database computers using partition tuning,”inProc. 17th Int. Conf. Very Large Databases, Barcelona, Spain, Sept. 1991, pp. 525–535.
[18] K.A. Hua,Y.L. Lo,, and H. Young,“Considering data skew factor in multiway join query optimization for parallel execution,” Very Low Data Base J. vol. 2, no. 3, pp. 303-330, July 1993.
[19] K.A. Hua,Y.-L. Lo,, and H.C. Young,“Including the load balancing issue in the optimization of multi-way join queries for shared-nothing database computers,” Proc. Second Conf. Parallel and Distributed Information Systems, pp. 74-83, Jan. 1993.
[20] Y.E. Ioannidis and Y.C. Kang,“Left-deep vs. bushy trees: An analysis of strategy spaces and its implication for query optimization,” Proc. ACM-SIGMOD Conf., vol. 20, pp. 168-177, 1991.
[21] M. Jarke and J. Koch, “Query Optimization in Database Systems,” ACM Computer Surveys, vol. 16, pp. 111–152, 1984.
[22] M. Kitsuregawa,H. Tanaka,, and T. Moto-Oka,“Architecture and performance of relational algebra machine GRACE,” Proc. Int’l Conf. Parallel Processing, pp. 241-250, Aug. 1984.
[23] M.-L. Lo, M.-S. Chen, C. V. Ravishankar, and P. S. Yu,“On optimal processor allocation to support pipelined hash joins,”inProc. ACM SIGMOD, May 1993, pp. 69–78.
[24] R.A. Lorie,J.-J. Daudenarde,J.W. Stamos,, and H.C. Young,“Exploiting database parallelism in a message-passing multiprocessor,” IBM J. of Research and Development, vol. 35, nos. 5/6, pp. 681-695, Sept./Nov. 1991.
[25] H. Lu, M.-C. Shan, and K.-L. Tan,“Optimization of multi-way join queries for parallel execution,”inProc. 17th Int. Conf. Very Large Databases, Barcelona, Spain, Sept. 1991, pp. 549–560.
[26] H. Lu,K.L. Tan,, and M.-C. Shan,“Hash-based join algorithms for multiprocessor computers with shared memory,” Proc. 16th Int’l Conf. Very Large Data Bases, pp. 198-209, Aug. 1990.
[27] P. Mishra and M.H. Eich, "Join Processing in Relational Databases," ACM Computing Surveys, vol. 24, no. 1, pp. 64-113, Mar. 1992.
[28] E.R. Omiecinski and E.T. Lin,“Hash-based and index-based join algorithms for cube and ring connected multicomputers,” IEEE Trans. on Knowledge and Data Engineering, vol. 1, no. 3, pp. 329-343, Sept. 1989.
[29] H. Pirahesh,C. Mohan,J. Cheng,T.S. Liu,, and P. Selinger,“Parallelism in relational data base systems: Architectural issues and design approaches,” Proc. Second Int’l Symp. Databases in Parallel and Distributed Systems, pp. 4-29, July 1990.
[30] G.Z. Qadah and K.B. Irani,“The join algorithms on a shared-memory multiprocessor database machine,” IEEE Trans. Software Engineering, vol. 14, no. 11, pp. 1,668-1,683, Nov. 1988.
[31] J. Richardson,H. Lu,, and K. Mikkilineni,“Design and evaluation of parallel pipelined join algorithms,” Proc. ACM SIGMOD, pp. 399-409, May 1987.
[32] N. Roussopoulos and H. Kang, “A Pipeline N-Way Join Algorithm Based on the 2-Way Semijoin Program,” IEEE Trans. Knowledge and Data Eng,m vol. 3, no. 4, pp. 461-473, Dec. 1991.
[33] D. Schneider,“Complex query processing in multiprocessor database machines,” Tech. Report 965, Computer Science Dept., Univ. of Wisconsin, Madison, Sept. 1990.
[34] D. Schneider and D. DeWitt, “A Performance Evaluation of Four Parallel Join Algorithms in a Shared-Nothing Multiprocessor Environment,” ACM SIGMOD Record, vol. 18, no. 2, pp. 110-121, June 1989.
[35] D. Schneider and D. J. DeWitt,“Tradeoffs in processing complex join queries via hashing in multiprocessor database machines,”inProc. 16th Int. Conf. Very Large Databases, Brisbane, Australia, Aug. 1990, pp. 469–480
[36] M. Stonebraker,R. Katz,D. Patterson,, and J. Ousterhout,“The design of XPRS,” Proc. 14th Int’l Conf. Very Large Data Bases, pp. 318-330, 1988.
[37] A. Swami,“Optimization of large join queries: Combining heuristics with combinatorial techniques,”inProc. ACM SIGMOD, Chicago, IL, June 1989, pp. 367–376.
[38] Teradata,“DBC/1012 database computer system manual release 2.0,” Tech. Report C10-0001-02, Teradata Corporation, Nov. 1985.
[39] P. Valduriez and G. Gardarin,“Join and semijoin algorithms for a multiprocessor database machine,” ACM Trans. on Database Systems, vol. 9, no. 1, pp. 133-161, Mar. 1984.
[40] C.B. Walton, A.G. Dale, and R.M. Jenevein, “A Taxonomy and Performance Model of Data Skew Effects in Parallel Joins,” Proc. 17th Conf. Very Large Databases (VLDB), pp. 537-48, Sept. 1991.
[41] A. Wilschut and P. Apers,“Dataflow query execution in parallel main-memory environment,” Proc. First Conf. Parallel and Distributed Information Systems, pp. 68-77, Dec. 1991.
[42] J.L. Wolf, P.S. Yu, J. Turek, and D.M. Dias, “A Parallel Hash Join Algorithm for Managing Data Skew,” IEEE Trans. Parallel and Distributed Systems, vol. 4, no. 12, 1993.
[43] P.S. Yu,M.-S. Chen,H. Heiss,, and S.H. Lee,“On workload characterization of relational database environments,” IEEE Trans on Software Engineering, vol. 18, no. 4, pp. 347-355, Apr. 1992.
[44] P. S. Yu, M.-S. Chen, J. L. Wolf, and J. J. Turek,“Parallel query processing,”inAdvanced Database Systems, N. Adam and B. Bhargavach, Eds. New York: Springer-Verlag, Dec. 1993, ch. 12, pp. 239–258, Lecture Notes in Computer Science 759.
[45] M. Ziane,M. Zait,, and P. Borla-Salamet,“Parallel query processing in DBS,” Proc. Second Conf. Parallel and Distributed Information Systems, pp. 93-102, Jan. 1993.

Index Terms:
Pipelining, parallel query processing, bushy trees, right-deep trees, hash joins.
Ming-Syan Chen, Mingling Lo, Philip S. Yu, Honesty C. Young, "Applying Segmented Right-Deep Trees to Pipelining Multiple Hash Joins," IEEE Transactions on Knowledge and Data Engineering, vol. 7, no. 4, pp. 656-668, Aug. 1995, doi:10.1109/69.404036
Usage of this product signifies your acceptance of the Terms of Use.