This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Dynamic Load Balancing in Very Large Shared-Nothing Hypercube Database Computers
December 1993 (vol. 42 no. 12)
pp. 1425-1439

Two relational join strategies, broadcast-based join and bucket-based join, have been recently proposed for the hypercube interconnection topology. The first strategy, however, incurs many unnecessary comparisons of pairs of tuples of the two relations. Although the second technique compares only tuples of the relevant buckets, it may suffer from potential load imbalance, which is most critical to the performance of a shared-nothing system. To address these issues, a join algorithm for hypercube computers which includes dynamic load balancing capabilities to minimize the effect of skewness in tuple distribution is proposed. Simulation results indicate that the scheme provides significant improvement over the bucket-based join strategy. In fact, the technique is consistently superior even when the skew condition is very mild.

[1] C. K. Baru and O. Frieder, "Database operations in a cube-connected multicomputer system,"IEEE Trans. Comput., vol. 38, no. 6, pp. 920-927, 1989.
[2] H. Boral, W. Alexander, L. Clay, G. Copeland, S. Danforth, M. Franklin, B. Hart, M. Smith, and P. Valduriez, "Prototyping Bubba, a highly parallel database system,"IEEE Trans. Knowledge Data Engineering, vol. 2, no. 1, pp. 4-24, 1990.
[3] H. Boral, D. J. DeWitt, D. Friedland, N. F. Jarrell, and W. K. Wilkinson, "Implementation of the database machine direct,"IEEE Trans. Software Eng., vol. SE-8, no. 6, pp. 533-543, Sept. 1982.
[4] D. J. DeWitt and R. Gerber, "Multiprocessor hash-based join algorithms," inProc. '85 Int. Conf. VLDB, pp. 151-164, 1985.
[5] D. J. DeWitt, S. Ghandeharizadeh, D. A. Schneider, A. Bricker, H.-I. Hsiao, and R. Rassmussen, "The Gamma database machine project,"IEEE Trans. Knowledge Data Eng., vol. 2, no. 1, pp. 44-62, 1990.
[6] D. J. DeWittet al., "Implementation techniques for main memory databases," inProc. ACM Sigmod(Boston, MA), June 18-21, 1984, pp. 1-8.
[7] S. Englert, J. Gray, T. Kocher, and P. Shah, "A benchmark of nonstop SQL release 2 demonstrating near-linear speedup and scaleup on large databases," inSIGMETRICS90, pp. 245-246, May 1990.
[8] O. Frieder, "Multiprocessor algorithms for relational database operators on hypercube systems,"IEEE Computers, vol. 23, no. 11, pp. 13-28, Nov. 1990.
[9] Y. Hirano, T. Satoh, U. Inoue, and K. Teranaka, "Load balancing algorithms for parallel database processing on shared memory multiprocessors," inProc. Int. Conf. Parallel Distributed Info. Syst., pp. 210-217, Miami Beach, FL, Dec. 1991.
[10] K. A. Hua and H. C. Young, "Designing a highly parallel database server using off-the-shelf components," inProc. Int. Computer Symp., Hsinchu, Taiwan, Dec. 1990, pp. 47-54.
[11] K. A. Hua and C. Lee, "An adaptive data placement Scheme for parallel database computer systems," inProc. 16th Int. Conf. VLDB, Brisbane, Australia, Aug. 1990, pp. 493-506.
[12] K. A. Hua and C. Lee, "Handling data skew in multiprocessor database computers using partition tuning," inProc. Int. Conf. VLDB, Barcelona, Spain, Sept. 1991, pp. 525-535.
[13] K. A. Hua, C. Lee, and J.-K. Peir, "Interconnecting shared-nothing systems for concurrent query processing," inProc. Int. Conf. Parallel Distributed Info. Syst., Miami Beach, FL., Dec. 1991, pp. 262-270.
[14] H. F. Jordan, "A special purpose architecture or finite element analysis," inProc. Int. Conf. Parallel Processing. Aug. 1978, pp. 263-266.
[15] M. Kitsuregawa and Y. Ogawa, "Bucket spreading parallel hash: A new robust, parallel hash join method for data skew in the super database computer (SDC)," inProc. 16th Int. Conf. VLDB, Brisbane, Australia, Aug. 1990, pp. 210-221.
[16] M. Kitsuregawa, H. Tanaka, and T. Moto-oka, "Application of hash to database machine and the architecture,"New, Generation Comput., vol. 1, no. 1, pp. 66-74, 1983.
[17] B. Charron-Bost, "Combinatorics and Geometry of Consistent Cuts: Application to Concurrency Theory," inDistributed Algorithms, J.-C. Bermond and M. Raynal. eds.,Lecture Notes in Computer Science, Vol. 392, Springer-Verlag, Berlin, 1989.
[18] S. Lakshmi and P. S. Yu, "Limiting factors of join performance on parallel processors,"Proc. 5th Int. Conf. Data Eng., Feb. 1989, pp. 488-496.
[19] R. Lorie, J.-J. Daudenarde, G. Hallmark, J. Stamos, and H. Young, "Adding intra-transaction parallelism to an existing dbms: Early experience,"IEEE Data Eng. Bulletin, vol. 12, no 1, pp. 2-8, Mar. 1989.
[20] nCUBE, Beaverton, Oregon,nCUBE 2 Supercomputers: Technical Overview, 1990.
[21] S. F. Nugent, "The iPSC/2 Direct-Connect communications technology," inProc. Third Conf. Hypercube Comput. Appl., Pasadena, CA, Jan. 1988, pp. 56-60.
[22] E. Omiecinski, "Performance analysis of a load balancing hash-join algorithm for a shared memory multiprocessor," inProc. Int. Conf. VLDB, Barcelona, Spain, Sept. 1991, pp. 375-385.
[23] E. Omiecinski and E. Lin, "Hash-based and index-based join algorithms for cube and ring connected multicomputers,"IEEE Trans. Knowledge Data Eng., vol. 1, no. 3, pp. 329-342, Sept. 1989.
[24] E. Omiecinski and E. Tien, "A hash-based join algorithm for a cube-connected parallel computer,"Info. Proc. Lett., vol. 30, no. 5, pp. 269-275, Mar. 1989.
[25] K. Rudin, "Oracle for massively parallel systems," Tech. Rep. 51313- 0990, Oracle Corporation, 1990.
[26] G. M. Sacco, "Fragmentation: A technique for efficient query processing,"ACM Trans. Database Syst., vol. 11, no. 2, pp. 113-133, 1986.
[27] D. Schneider and D. Dewitt, "A performance evaluation of four parallel join algorithms in a shared-nothing multiprocessor environment," inProc. ACM SIGMOD Conf.(Portland, OR), May-June 1989, p. 110.
[28] M. Stonebraker, "The case for shared nothing,"IEEE Database Eng., vol. 9, no. 1, pp. 4-9, 1986.
[29] M. Stonebraker, "The design of XPRS," inProc. 14th Int. Conf. VLDB, pp. 318-330, Los Angeles, Aug. 1988.
[30] Teradata Corporation, Los Angeles, California,Teradata DBC/1012 Data Base Computer Concepts und Facilities, Release 3.1 ed., 1988, Teradata Document C02-0001-05.
[31] S. G. Tucker, "The IBM 3090 system: An overview,"IBM Systems J., vol. 25, no. 1, pp. 4-19, 1986.
[32] C. Turbyfill, "Comparative Benchmark of Relational Database Systems," Ph.D. dissertation, Cornell Univ. Sept. 1987.
[33] J. L. Wolf, D. Dias, and P. S. Yu, "An effective algorithm for parallelizing sort merge joins in the presence of data skew," inProc. Second Int. Conf. Databases in Parallel and Distributed Syst., Dublin, Ireland, July 1990, pp. 103-115.
[34] J. I. Wolf, D. M. Dias, P. S. Yu, and J Turek, "An efficient algorithm for parallelizing hash joins in presence of data skew," inProc. Int. Conf. Data Eng., pp. 200-209, Kobe, Japan, Apr. 1991.
[35] G. K. Zipf,Humun Behavior and the Principle of Least Effort: An Introduction to Human Ecology, Reading, MA: Addison-Wesley, 1949.

Index Terms:
database management systems; hypercube networks; parallel architectures; relational databases; resource allocation; load balancing; shared-nothing hypercube; database computers; relational join strategies; broadcast-based join; bucket-based join; dynamic load balancing; skewness; tuple distribution; parallel join; relational database.
Citation:
K.A. Hua, J.X.W. Su, "Dynamic Load Balancing in Very Large Shared-Nothing Hypercube Database Computers," IEEE Transactions on Computers, vol. 42, no. 12, pp. 1425-1439, Dec. 1993, doi:10.1109/12.260633
Usage of this product signifies your acceptance of the Terms of Use.