This Article 
 Bibliographic References 
 Add to: 
Utilizing Page-Level Join Index for Optimization in Parallel Join Execution
December 1995 (vol. 7 no. 6)
pp. 900-914

Abstract—This paper presents a methodology to the optimization in parallel join execution. The past researches on parallel join methods mostly focused on the design of algorithms for partitioning (e.g., hash) relations and distributing data buckets as evenly as possible to the processors. Once data are distributed to the processors, they assume that all processors will complete their tasks at about the same time. We stress that this is true if no further information such as page-level join index (to be discussed later) is available. Otherwise, the join execution can be further optimized and the workload in the processors may still be unbalanced. We study such problems that may incur in a shared-nothing architecture environment and propose algorithms to the problems in the paper. Also, a simulation study is performed to understand the characteristics of the proposed method.

[1] D. Bitton,H. Boral,D. DeWitt,, and W. Wilkinson,“Parallel algorithms for the execution of relational database operations,” ACM Trans. Database Systems, vol. 8, no. 3, pp. 324-353, Sept. 1983.
[2] M. Kitsuregawa,H. Tanaka,, and T. Moto-oka,“Architecture and performance of relational algebra machine GRACE,” Proc. Int’l Conf. Parallel Processing, 1984.
[3] D.J. DeWitt, R.H. Katz, F. Olken, L.D. Shapiro, and M.R. Stonebraker, “Implementation Techniques for Main Memory Database Systems,” Proc. ACM SIGMOD, 1984.
[4] D. DeWitt et al., "GAMMA—A High Performance Backend Database Machine," Proc. 12th Conf. Very Large Data Bases,Kyoto, Japan, Aug. 1986.
[5] T.H. Merret,Y. Kambayashi,, and H. Yasuura,“Scheduling of page-fetches in join operations,” Proc. Seventh Int’l Conf. Very Large Data Bases, pp. 488-498, 1981.
[6] S. Pramanik and D. Ittner, "Use of Graph-Theoretic Models for Optimal Relational Database Accesses to Perform Join," ACM Trans. Database Systems, vol. 10, no. 1, pp. 57-74, Mar. 1985.
[7] P. Goyal, H.F. Li, E. Regener, and F. Sadri, “Scheduling of Page Fetches in Join Operations Using Bc-Trees,” Proc. Conf. Data Eng., 1988.
[8] M.C. Murphy and D. Rotem,“Processor scheduling for multiprocessor joins,” Proc. IEEE Fifth Int’l Conf. Data Eng., pp. 140-148, 1989.
[9] M.C. Murphy and D. Rotem,“Effective resource utilization for multiprocessor join execution,” Proc. 15th Int’l Conf. Very Large Data Bases, pp. 67-75, 1989.
[10] Teradata DBC/1012 Database Computer Concepts and Facilities, Teradata Corp., release 3.1 edition, 1988. Teradata Document C02-0001-05.
[11] H. Boral,W. Alexander,L. Clay,G. Copeland,S. Danforth,M. Franklin,B. Hart,M. Smith,, and P. Valduriez,“Prototyping Bubba, a highly parallel database system,” IEEE Trans. on Knowledge and Data Engineering, vol. 2, no. 1, pp. 4-24, Mar. 1990.
[12] D.J. DeWitt,S. Ghandeharizadeh,D.A. Schneider,A. Bricker,H.I. Hsiao,, and R. Rasmussen,“The gamma database machine project,” IEEE Trans. on Knowledge and Data Engineering, vol. 2, no. 1, pp. 44-62, Mar. 1990.
[13] M. Kitsuregawa and Y. Ogawa, “Bucket Spreading Parallel Hash: A New Robust, Parallel Hash Join Method for Data Skew in the Super Database Computer (SDC),” Proc. 16th Conf. Very Large Databases (VLDB), pp. 210-221, 1990.
[14] R. Lorie,J. J. Daudenarde,G. Hallmark,J. Stamos,, and H. Young,“Adding intra-transaction parallelism to an existing DBMS: Early experience,” IEEE Data Eng. Bull., vol.12, no.1, pp. 2-8, Mar. 1989.
[15] D. Schneider and D. DeWitt, “A Performance Evaluation of Four Parallel Join Algorithms in a Shared-Nothing Multiprocessor Environment,” ACM SIGMOD Record, vol. 18, no. 2, pp. 110-121, June 1989.
[16] K.A. Hua and C. Lee,“An adaptive data placement scheme for parallel database computer systems,” Proc. Int’l Conf. VLDB, 1990.
[17] M.S. Lakshmi and P.S. Yu,“Limiting factors of join performance on parallel processors, “Proc. Data Eng. Conf., pp. 488-496, 1989.
[18] C.B. Walton, A.G. Dale, and R.M. Jenevein, “A Taxonomy and Performance Model of Data Skew Effects in Parallel Joins,” Proc. 17th Conf. Very Large Databases (VLDB), pp. 537-48, Sept. 1991.
[19] J.L. Wolf,D.M. Dias,P.S. Yu,, and J. Turek,“An effective algorithm for parallelizing hash joins in the presence of data skew,” Proc. Data Eng. Conf., pp. 200-209, 1991.
[20] K. Hua and C. Lee,“Handling data skew in multiprocessor database computers using partition tuning,”inProc. 17th Int. Conf. Very Large Databases, Barcelona, Spain, Sept. 1991, pp. 525–535.
[21] M. Kitsuregawa,L. Harada,, and M. Takagi,“Join strategies on k-d tree indexed relations,” Proc. Int’l Conf. Data Eng., pp. 85-93, 1989.
[22] P. Valduriez, “Join Indices,” ACM Trans. Database Systems, vol. 12, no. 2, 1987.
[23] J. Nievergelt, H. Hinterberger, and K.C. Sevcik, "The Grid File: An Adaptable, Symmetric Multikey File Structure," ACM Trans. Database Systems, vol. 9, no. 1, pp. 38-71, Mar. 1984.
[24] D. Rotem, "Spatial Join Indices," Proc. Seventh Int'l Conf. Data Eng., pp. 500-509, 1991.
[25] C.A. Lynch,“Selectivity estimation and query optimization in large databases with highly skewed distributions of column values,” Proc. VLDB Conf., pp. 240-251, 1988.
[26] H. Hsiao and D.J. DeWitt, “Chained Declustering: A New Availability Strategy for Multiprocessor Database Machines,” Proc. Data Eng., pp. 456–465, 1990.
[27] Z.A. Chang,“A query optimization technique in multi-computer database systems,” MS thesis, Inst. of Information Eng., Nat’l Cheng-Kung Univ., Tainan, Taiwan, Republic of China, 1992.
[28] D. Knuth, The Art of Computer Programming, vol. 3: Sorting and Searching. Addison-Wesley, 1973.
[29] G. Zipf,Human Behavior and the Principle of Least Effort. Addison-Wesley, 1949.

Index Terms:
Bipartite graph, join index, parallel join execution, query optimization, relational database operations, workload balance.
Chiang Lee, Zue-An Chang, "Utilizing Page-Level Join Index for Optimization in Parallel Join Execution," IEEE Transactions on Knowledge and Data Engineering, vol. 7, no. 6, pp. 900-914, Dec. 1995, doi:10.1109/69.476496
Usage of this product signifies your acceptance of the Terms of Use.