This Article 
 Bibliographic References 
 Add to: 
Dynamic Load Balancing in Multicomputer Database Systems Using Partition Tuning
December 1995 (vol. 7 no. 6)
pp. 968-983

Abstract—Shared nothing multiprocessor architecture is known to be more scalable to support very large databases. Compared to other join strategies, a hash-based join algorithm is particularly efficient and easily parallelized for this computation model. However, this hardware structure is very sensitive to the skew in tuple distribution. Unless the parallel hash join algorithm includes some dynamic load balancing mechanism, the skew effect can severely deteriorate the system performance. In this paper, we investigate this issue. In particular, three parallel hash join algorithms are presented. We implement a simulator to study the effectiveness of these schemes. The simulation model is validated by comparing the simulation results to those produced by the actual implementation of the algorithms running on a multiprocessor system. Our performance study indicates that a naive approach is not able to provide tangible savings. However, the carefully designed strategies can offer substantial improvement over conventional techniques for a wide range of skew conditions.

[1] F. Barlos and O. Frieder,“On the development of a site selection optimizer for distributed andparallel database systems,” Proc. Conf. Information and Knowledge Management, pp. 684-693,Washington, D.C., Nov. 1993.
[2] H. Boral,W. Alexander,L. Clay,G. Copeland,S. Danforth,M. Franklin,B. Hart,M. Smith,, and P. Valduriez,“Prototyping Bubba, a highly parallel database system,” IEEE Trans. on Knowledge and Data Engineering, vol. 2, no. 1, pp. 4-24, Mar. 1990.
[3] D.J. DeWitt and R. Gerber,“Multiprocessor hash-based join algorithms,” Proc.’85 Int’l Conf. VLDB, pp. 151-164, 1985.
[4] D.J. DeWitt,S. Ghandeharizadeh,D.A. Schneider,A. Bricker,H.I. Hsiao,, and R. Rasmussen,“The gamma database machine project,” IEEE Trans. on Knowledge and Data Engineering, vol. 2, no. 1, pp. 44-62, Mar. 1990.
[5] D.J. DeWitt, R.H. Katz, F. Olken, L.D. Shapiro, and M.R. Stonebraker, “Implementation Techniques for Main Memory Database Systems,” Proc. ACM SIGMOD, 1984.
[6] D. DeWitt, J. Naughton, D. Schneider, and S. Seshadri,“Practical skew handling in parallel joins,”inProc. 18th Int. Conf. Very Large Databases, Vancouver, B.C., Aug. 1992, pp. 27–40.
[7] S. Englert,J. Gray,T. Kocher,, and P. Shah,“A benchmark of nonstop SQL release 2 demonstrating near-linear speedupand scaleup on large databases,” Proc. SIGMETRICS 90, pp. 245-246, May 1990.
[8] O. Frieder,“Multiprocessor algorithms for relational-database operations on hypercube systems,” Computer, pp. 13-28, Nov. 1990.
[9] J. L. Gustafson,“Reevaluating Amdahl's law,”Commun. ACM, vol. 31, no. 5, pp. 532–533, 1988.
[10] R. Holbrook,“Nonstop SQL—A distributed relational dbms for oltp,” Proc. Compcon’88,San Francisco, Feb. 1988.
[11] K.A. Hua and C. Lee,“An adaptive data placement scheme for parallel database computer systems,” Proc. Int’l Conf. VLDB, 1990.
[12] K. Hua and C. Lee,“Handling data skew in multiprocessor database computers using partition tuning,”inProc. 17th Int. Conf. Very Large Databases, Barcelona, Spain, Sept. 1991, pp. 525–535.
[13] K.A. Hua,Y.-L. Lo,, and H.C. Young,“Optimizer-assisted load balancing techniques for multicomputer databasemanagement systems,” J. Parallel and Distributed Computing.
[14] K.A. Hua,Y.-L. Lo,, and H.C. Young,“Effective skew handling in parallel multiway joins,” VLDB J., vol. 2, no. 3, pp. 303-330, July 1993.
[15] K.A. Hua and J.X.-W. Su,“Dynamic load balancing in very large shared-nothing hypercube databasecomputers,” IEEE Trans. Computers, vol. 42, no. 12, pp. 1,425-1,439, Dec. 1993.
[16] K.A. Hua,W. Tavanapong,, and H. Young,“Performance evalutation of load balancing techniques for multicomputerdatabase systems,” Proc. Int’l Conf. Data Eng.,Taipei, Mar. 1995.
[17] K.A. Hua and H.C. Young,“Designing a highly parallel database server using off-the-shelfcomponents,” Proc. Int’l Computer Symp., pp. 47-54,Hsinchu, Taiwan, Dec. 1990.
[18] H.F. Jordan,“A special purpose architecture for finite element analysis,” Proc. 1978 Int’l Conf. Parallel Processing, pp. 263-266, 1978.
[19] M. Kitsuregawa and Y. Ogawa, “Bucket Spreading Parallel Hash: A New Robust, Parallel Hash Join Method for Data Skew in the Super Database Computer (SDC),” Proc. 16th Conf. Very Large Databases (VLDB), pp. 210-221, 1990.
[20] M. Kitsuregawa,H. Tanaka,, and T. Moto-oka,“Application of hash to database machine and its architecture,” New Generation Computing, vol. 1, no. 1, pp. 66-74, 1983.
[21] S. Lakshmi and P.S. Yu,“Effect of skew on join performance in parallel architectures,” Proc. Int’l Symp. Databases in Parallel and Distributed Systems, pp. 107-117,Austin, Tex., Dec. 1988.
[22] M.S. Lakshmi and P.S. Yu,“Limiting factors of join performance on parallel processors, “Proc. Data Eng. Conf., pp. 488-496, 1989.
[23] D.H. Lawrie,“Access and alignment of data in an array processor,” IEEE Trans. Computers, vol. 24, no. 12, pp. 1,145-1,155, Dec. 1975.
[24] R.A. Lorie,J.-J. Daudenarde,J.W. Stamos,, and H.C. Young,“Exploiting database parallelism in a message-passing multiprocessor,” IBM J. of Research and Development, vol. 35, nos. 5/6, pp. 681-695, Sept./Nov. 1991.
[25] nCUBE 2 Supercomputers: Technical Overview.Beaverton, Ore.: nCUBE, 1990.
[26] E. Omiecinski and E. Tien,“A hash-based join algorithm for a cube-connected parallel computer,” Information Processing Letters, vol. 30, no. 5, pp. 269-275, Mar. 1989.
[27] D. Reed and R. Fujimoto, Multicomputer Networks: Message-Based Parallel Processing. MIT Press, 1987.
[28] G. Sacco,“Fragmentation: A technique for efficient query processing,” ACM Trans. on Database Systems, vol. 11, no. 2, pp. 113-133, June 1986.
[29] D. Schneider and D. DeWitt, “A Performance Evaluation of Four Parallel Join Algorithms in a Shared-Nothing Multiprocessor Environment,” ACM SIGMOD Record, vol. 18, no. 2, pp. 110-121, June 1989.
[30] M. Stonebraker,“The case for shared nothing,” IEEE Database Engineering, vol. 9, no. 1, pp. 4-9, Mar. 1986.
[31] Teradata DBC/1012 Data Base Computer Concepts and Facilities, release 3.1 edition, 1988.Los Angeles: Teradata Corporation, Document C02-0001-05.
[32] C. Turbyfill,“Comparative benchmark of relational database systems,” PhD thesis, Cornell Univ., Sept. 1987.
[33] C.B. Walton, A.G. Dale, and R.M. Jenevein, “A Taxonomy and Performance Model of Data Skew Effects in Parallel Joins,” Proc. 17th Conf. Very Large Databases (VLDB), pp. 537-48, Sept. 1991.
[34] J. Wolf, D. Dias, P. Yu, and J. Turek,“An effective algorithm for parallelizing hash joins in the presence of data skew,”inProc. 7th Int. Conf. Data Eng., Kobe, Japan, Apr. 1991, pp. 200–209.
[35] G.K. Zipf,Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology.Reading, Mass.: Addison-Wesley, 1949.

Index Terms:
Database machine, load balancing, parallel join algorithm, query processing, relational database.
Kien A. Hua, Chiang Lee, Chau M. Hua, "Dynamic Load Balancing in Multicomputer Database Systems Using Partition Tuning," IEEE Transactions on Knowledge and Data Engineering, vol. 7, no. 6, pp. 968-983, Dec. 1995, doi:10.1109/69.476502
Usage of this product signifies your acceptance of the Terms of Use.