This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Permutation-Based Range-Join Algorithms on N-Dimensional Meshes
April 2002 (vol. 13 no. 4)
pp. 413-431

In this paper, we present four efficient parallel algorithms for computing a nonequijoin, called range-join, of two relations on N\hbox{-}{\rm dimensional} mesh-connected computers. Range-joins of relations R and S are an important generalization of conventional equijoins and band-joins and are solved by permutation-based approaches in all proposed algorithms. In general, after sorting all subsets of both relations, the proposed algorithms permute every sorted subset of relation S to each processor in turn, where it is joined with the local subset of relation R. To permute the subsets of S efficiently, we propose two data permutation approaches, namely, the shifting approach which permutes the data recursively from lower dimensions to higher dimensions and the Hamiltonian-cycle approach which first constructs a Hamiltonian cycle on the mesh and then permutes the data along this cycle by repeatedly transferring data from each processor to its successor. We apply the shifting approach to meshes with different storage capacities which results in two different join algorithms. The Basic Shifting Join (BASHJ) algorithm can minimize the number of subsets stored temporarily at a processor, but requires a large number of data transmissions, while the Buffering Shifting Join (BUSHJ) algorithm can achieve a high parallelism and minimize the number of data transmissions, but requires a large number of subsets stored at each processor. For constructing a Hamiltonian cycle on a mesh, we propose two different methods which also result in two different join algorithms. The Recursive Hamiltonian-Cycle Join (REHCJ) algorithm uses a single processor to construct a Hamiltonian cycle recursively, while the Parallel Hamiltonian-Cycle Join (PAHCJ) algorithm uses all processors to construct a Hamiltonian cycle in parallel. We analyze and compare these algorithms. The results shows that both Hamiltonian cycle algorithms require less storage and local join operations than the shifting algorithms, but more data movement steps.

[1] S.G. Akl, The Design and Analysis of Parallel Algorithms. Orlando, Fl.: Academic Press, 1989.
[2] S.D. Chen, H. Shen, and R.W. Topor, “Efficient Parallel Permutation-Based Range-Join Algorithms on Mesh-Connected Computers,” Technical Report CIT-94-19, CIT, Griffith Univ., Australia, Aug. 1994.
[3] S.D. Chen, H. Shen, and R.W. Topor, “An Improved Hash-Based Join Algorithm in the Presence of Double Skew on a Hypercube Computer,” Proc. 17th Australian Computer Science Conf., Jan. 1994.
[4] S.D. Chen, H. Shen, and R.W. Topor, “Permutation-Based Parallel Range-Join Algorithm on N-Dimensional Torus Computers,” Information Processing Letters, vol. 52, no. 10, pp. 35-38, Oct. 1994.
[5] S.D. Chen, H. Shen, and R.W. Topor, “Efficient Parallel Permutation-Based Range-Join Algorithms on Mesh-Connected Computers,” Proc. 1995 Asian Computing Science Conf., pp. 225-238, Dec. 1995.
[6] D. DeWitt and J. Gray, “Parallel Database Systems: The Future of High-Performance Database Systems,” Comm. ACM, Vol. 35, No. 6, June 1992, pp. 85-98.
[7] D.J. DeWitt, J.F. Naughton, and D.A. Schneider, “An Evaluation of Non-Equijoin Algorithms,” Proc. 17th Conf. Very Large Databases (VLDB), pp. 443-452, Sept. 1991.
[8] Intel Corporation. Intel Corporation Literature. Nov. 1991.
[9] H. Jhang, “Performance Comparison of Join on Hypercube and Mesh” Proc. 1992 ACM Computer Science Conf., pp. 243-250, 1992.
[10] M. Kitsuregawa and Y. Ogawa, “Bucket Spreading Parallel Hash: A New Robust, Parallel Hash Join Method for Data Skew in the Super Database Computer (SDC),” Proc. 16th Conf. Very Large Databases (VLDB), pp. 210-221, 1990.
[11] F.T. Leighton,Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes.San Mateo, Calif.: Morgan Kaufmann, 1992.
[12] D. Schneider and D. DeWitt, “A Performance Evaluation of Four Parallel Join Algorithms in a Shared-Nothing Multiprocessor Environment,” ACM SIGMOD Record, vol. 18, no. 2, pp. 110-121, June 1989.
[13] H. Shen, “An Improved Selection-Based Parallel Range-Join Algorithm in Hypercubes,” Proc. 20th EUROMICRO Conf., pp. 65-72, Sept. 1994.
[14] H. Shen, “Efficient Parallel k-Set Chain Range-Join in Hypercubes.” Computer J., vol. 38, no. 3, pp. 217-225, 1995.
[15] H. Shen, “An Efficient Permutation-Based Parallel Algorithm for Range-Join in Hypercubes,” Parallel Computing, vol. 21, pp. 303-313, 1995.
[16] H. Shen, “Parallel k-Set Mutual Range-Join in Hypercubes,” Microprocessing and Microprogramming, vol. 41, no. 7, pp. 443-448, 1995.
[17] M. Stonebraker, “The Case for Shared Nothing,” Database Eng., vol. 9, no. 1, pp. 4-9, 1986.
[18] J.D. Ullman, Principles of Database and Knowledge-Base Systems, vol. II: The New Tech nologies. New York: Computer Science Press, 1989.
[19] C.B. Walton, A.G. Dale, and R.M. Jenevein, “A Taxonomy and Performance Model of Data Skew Effects in Parallel Joins,” Proc. 17th Conf. Very Large Databases (VLDB), pp. 537-48, Sept. 1991.

Index Terms:
analysis of algorithms, data permutation, N\hbox{-}{\rm dimensional} meshes, relational databases, parallel processing, performance, range-join operations
Citation:
Shao Dong Chen, Hong Shen, Rodney Topor, "Permutation-Based Range-Join Algorithms on N-Dimensional Meshes," IEEE Transactions on Parallel and Distributed Systems, vol. 13, no. 4, pp. 413-431, April 2002, doi:10.1109/71.995821
Usage of this product signifies your acceptance of the Terms of Use.