This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Criss-Cross Hash Joins: Design and Analysis
July/August 2001 (vol. 13 no. 4)
pp. 637-653

Abstract—Join processing in relational database systems continues to be a difficult and challenging problem. In this research, we propose a criss-cross hash join strategy that draws from both hashing and indexing techniques, inheriting the advantages of each. To facilitate the criss-cross hash join, a simple data structure, termed page map, is introduced. The page maps aid in reducing the hashing effort incurred in the current hash based join methods. Furthermore, the page maps implicitly capture and exploit the possible inherent order among tuples in the relations, however partial it may be, to achieve superior performance. As the proposed methodology relies on the hashing scheme, the page maps are simpler, more compact, and easier to maintain than the traditional data structures associated with index based join methods. We develop the ideas intuitively first, followed by a formal development of the concepts and the algorithms. A detailed probabilistic analysis of the algorithms is presented and their performance is assessed through extensive empirical investigations. The empirical analysis suggests significant performance improvements over the current state-of-the-art hybrid hash method, especially in the presence of possible inherent order.

[1] D.A. Bell, D.H.O. Ling, and S. McClean, “Pragmatic Estimation of Join Sizes and Attribute Correlations,” Proc. Conf. Data Eng., 1989.
[2] M.W. Blasgen and K.P. Eswaran, “Storage and Access in Relational Databases,” IBM Systems J., vol. 16, no. 4, 1977.
[3] K. Bratbergsengen, “Hashing Methods and Relational Algebra Operations,” Proc. Conf. Very Large Databases, 1984.
[4] J.S.J. Chen and V.O.K. Li, “Optimizing Joins in Fragmented Database Systems on a Broadcast Local Network,” IEEE Trans. Software Eng., vol. 15, no. 1, Jan. 1989.
[5] M.S. Chen, M.L. Lo, P.S. Yu, and H.E. Young, “Applying Segmented Right-Deep Trees to Pipelining Multiple Hash Joins,” IEEE Trans. Knowledge and Data Eng., vol. 7, no. 4, Aug. 1995.
[6] M.-S. Chen and P.S. Yu, “Interleaving a Join Sequence with Semijoins in Distributed Query Processing,” IEEE Trans. Parallel and Distributed Systems, vol. 3, no. 5, pp. 611-621, Sept. 1992.
[7] J. Cheng, D. Haderle, R. Hedges, B. Iyer, T. Messinger, and C. Mohan, “An Efficient Hybrid Join Algorithm: A DB2 Prototype,” Proc. Seventh Int'l Conf. Data Eng., 1991.
[8] B.C. Desai, “Performance of a Composite Attribute and Join Index,” IEEE Trans. Software Eng., vol. 15, no. 2, Feb. 1989.
[9] D.J. DeWitt, R.H. Katz, F. Olken, L.D. Shapiro, and M.R. Stonebraker, “Implementation Techniques for Main Memory Database Systems,” Proc. ACM SIGMOD, 1984.
[10] R.D. Gopal, R. Ramesh, and Z. Zionts, “Join Processing in Relational Databases: Analysis, Design, and Prototype Development,” working paper, State Univ. of New York at Buffalo, 1998.
[11] L.R. Gotlieb, “Computing Joins of Relations,” Proc. ACM-SIGMOD Int'l Conf. Management of Data, 1975.
[12] P. Goyal, H.F. Li, E. Regener, and F. Sadri, “Scheduling of Page Fetches in Join Operations Using Bc-Trees,” Proc. Conf. Data Eng., 1988.
[13] H.-I. Hsiao, M.-S. Chen, and P. S. Yu,“On parallel execution of multiple pipelined hash joins,”inProc. ACM SIGMOD, Minneapolis, MN, May 1994, pp. 185–196.
[14] H. Pang,M.J. Carey,, and M. Livny,“Partially preemptible hash joins,” Proc. ACM SIGMOD Conf., May 1993.
[15] Y.E. Ioannidis and S. Christodoulakis, “Optimal Histograms for Limiting Worst-Case Error Propogation in the Size of Join Results,” ACM Trans. Database Systems, vol. 18, no. 4, 1993.
[16] Y.E. Ioannidis and Y.C. Kang,“Left-deep vs. bushy trees: An analysis of strategy spaces and its implication for query optimization,” Proc. ACM-SIGMOD Conf., vol. 20, pp. 168-177, 1991.
[17] M. Kitsuregawa, H. Tanaka, and T. Moto-Oka, “Application of Hash to Database Machine and Its Architecture,” New Generation Computing, 1983.
[18] D.E. Knuth, The Art of Computer Programming. Addison-Wesley, 1973.
[19] T. Lehman and M. Carey, “Query Processing in Main Memory Database Systems,” Proc. SIGMOD, 1986.
[20] M.-L. Lo, M.-S. Chen, C. V. Ravishankar, and P. S. Yu,“On optimal processor allocation to support pipelined hash joins,”inProc. ACM SIGMOD, May 1993, pp. 69–78.
[21] M. Matysiak, “Efficient Optimization of Large Join Queries Using Tabu Search,” Information Sciences, vol. 83, 1995.
[22] K.P. Mikkilineni and S.Y.W. Su, “An Evaluation of Relational Join Algorithms in a Pipelined Query Processing Environment,” IEEE Trans. Software Eng., vol. 14, no. 6, June 1988.
[23] P. Mishra and M.H. Eich, "Join Processing in Relational Databases," ACM Computing Surveys, vol. 24, no. 1, pp. 64-113, Mar. 1992.
[24] M. Negri and G. Pelagatti, “Distributive Join—A New Algorithm for Joining Relations,” ACM Trans. Database Systems, vol. 16, no. 4, 1991.
[25] S. Pramanik and D. Vineyard,“Optimizing join queries in distributed databases,” IEEE Trans. Software Engineering, vol. 14, pp. 1,319-1,326, 1988.
[26] D.J. Reid, “Optimal Distributed Execution of Join Queries,” Computers and Math. with Applications, vol. 27, no. 11, 1994.
[27] N. Roussopoulos, "The Incremental Access Method of View Cache: Concept, Algorithms, and Cost Analysis," ACM Trans. Database Systems, vol. 16, no. 3, pp. 535-563, Sept. 1991.
[28] L. Shapiro, "Join Processing in Database Systems with Large Main Memories," ACM Trans. Database Systems, vol. 11, no. 3, Sept. 1986.
[29] D. Shasha and T.L. Wang, “Optimizing Equijoin Queries in Distributed Databases where Relations Are Hash Partitioned,” ACM Trans. Database Systems, vol. 16, no. 2, 1991.
[30] E.J. Shekita and M.J. Carey, “A Performance Evaluation of Pointer-Based Joins,” Proc. ACM SIGMOD Int'l Conf. Management of Data, 1990.
[31] J.W. Stamos and H.C. Young, “A Symmetrical Fragment and Replicate Algorithm for Distributed Joins,” IEEE Trans. Parallel and Distributed Systems, vol. 4, no. 12, Dec. 1993.
[32] A. Swami,“Optimization of large join queries: Combining heuristics with combinatorial techniques,”inProc. ACM SIGMOD, Chicago, IL, June 1989, pp. 367–376.
[33] Y.C. Tay, “On Optimality of Strategies for Multiple Joins,” J. ACM, vol. 40, no. 5, 1993.
[34] J.D. Ullman, “Implementation of Logical Query Languages for Databases,” ACM Trans. Database Systems, vol. 10, no. 2, 1985.
[35] P. Valduriez, “Join Indices,” ACM Trans. Database Systems, vol. 12, no. 2, 1987.
[36] J.L. Wolf, P.S. Yu, J. Turek, and D.M. Dias, “A Parallel Hash Join Algorithm for Managing Data Skew,” IEEE Trans. Parallel and Distributed Systems, vol. 4, no. 12, 1993.
[37] C. Zaniolo, “The Representation and Deductive Retrieval of Complex Objects,” Proc. 11th VLDB, 1985.

Index Terms:
Database, relational architecture, query processing, join, hash, index.
Citation:
Ram D. Gopal, R. Ramesh, Stanley Zionts, "Criss-Cross Hash Joins: Design and Analysis," IEEE Transactions on Knowledge and Data Engineering, vol. 13, no. 4, pp. 637-653, July-Aug. 2001, doi:10.1109/69.940737
Usage of this product signifies your acceptance of the Terms of Use.