This Article 
 Bibliographic References 
 Add to: 
Optimizing Main-Memory Join on Modern Hardware
July/August 2002 (vol. 14 no. 4)
pp. 709-730

In the past decade, the exponential growth in commodity CPU's speed has far outpaced advances in memory latency. A second trend is that CPU performance advances are not only brought by increased clock rate, but also by increasing parallelism inside the CPU. Current database systems have not yet adapted to these trends and show poor utilization of both CPU and memory resources on current hardware. In this paper, we show how these resources can be optimized for large joins and translate these insights into guidelines for future database architectures, encompassing data structures, algorithms, cost modeling, and implementation. In particular, we discuss how vertically fragmented data structures optimize cache performance on sequential data access. On the algorithmic side, we refine the partitioned hash-join with a new partitioning algorithm called radix-cluster, which is specifically designed to optimize memory access. The performance of this algorithm is quantified using a detailed analytical model that incorporates memory access costs in terms of a limited number of parameters, such as cache sizes and miss penalties. We also present a calibration tool that extracts such parameters automatically from any computer hardware. The accuracy of our models is proven by exhaustive experiments conducted with the Monet database system on three different hardware platforms. Finally, we investigate the effect of implementation techniques that optimize CPU resource usage. Our experiments show that large joins can be accelerated almost an order of magnitude on modern RISC hardware when both memory and CPU resources are optimized.

[1] T. Mowry, "Tolerating Latency Through Software Controlled Data Prefetching," PhD Thesis, Dept. of Computer Science, Stanford Univ., Palo, Alto, Calif., Mar. 1994.
[2] A. Ailamaki et al., "DBMSs on a Modern Processor: Where Does the Time Go?" Proc. 25th Int'l Conf. Very Large Databases (VLDB 99), Morgan Kaufmann, 1999, pp. 15-26.
[3] L. Barroso, K. Gharachorloo, and E. Bugnion, "Memory System Characterization of Commercial Workloads," Proc. 25th Int'l Symp. Computer Architecture, June 1998, pp. 3-14.
[4] K. Keeton et al., "Performance Characterization of a Quad Pentium Pro SMP Using OLTP Workloads," Proc. 25th Int'l Symp. Computer Architecture, CS Press, Los Alamitos, Calif., 1998, pp. 15-26.
[5] P. Trancoso et al., "The Memory Performance of DSS Commercial Workloads in Shared-Memory Multiprocessors," Proc. Third Int'l Symp. High-Performance Computer Architecture, IEEE CS Press, Los Alamitos, Calif., 1997, pp. 250-260.
[6] Silicon Graphics, Inc., Performance Tuning and Optimization for Origin2000 and Onyx2. Jan. 1997.
[7] M. Kersten, “Using Logarithmic Code-Expansion to Speedup Index Access and Maintenance,” Proc. Int'l Conf. Foundation on Data Organization and Algorithms, pp. 228-232, Oct. 1989.
[8] P. Boncz and M. Kersten, “MIL Primitives For Querying a Fragmented World,” The VLDB J., vol. 8, no. 2, pp. 101-119, Oct. 1999.
[9] P.M.G. Apers, C.A. van den Berg, J. Flokstra, P.W.P. J. Grefen, M. Kersten, and A.N. Wilschut, “PRISMA/DB: A Parallel Main Memory Relational DBMS,” IEEE Trans. Knowledge and Data Eng., vol. 4, no. 6, pp. 541-554, Dec. 1992.
[10] P. Boncz, W. Quak, and M. Kersten, “Monet and Its Geographical Extensions: A Novel Approach to High-Performance GIS Processing,” Proc. Int'l Conf. Extending Database Technology, pp. 147-166, June 1996.
[11] P.A. Boncz, A.N. Wilschut, and M.L. Kersten, Flattening an Object Algebra to Provide Performance Proc. Int'l Conf. Data Eng., pp. 568-577, 1998.
[12] A. Shatdal, C. Kant, and J. Naughton, “Cache Conscious Algorithms for Relational Query Processing,” Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 510-512, Sept. 1994.
[13] T.J. Lehman and M.J. Carey, “A Study of Index Structures for Main Memory Database Management Systems,” Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 294-303, Aug. 1986.
[14] T. Lehman and M. Carey, “Query Processing in Main Memory Database Systems,” Proc. SIGMOD, 1986.
[15] M.H. Eich, “Main Memory Database Research Directions,” Proc. Sixth Int'l Workshop Database Machines, pp. 251-268, June 1989.
[16] A. Wilschut, “Parallel Query Execution in a Main-Memory Database System,” PhD thesis, Universiteit Twente, 1991.
[17] A. Analyti and S. Pramanik, “Fast Search in Main Memory Databases,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 215-224, June 1992.
[18] H. Garcia-Molina and K. Salem, “Main Memory Database Systems: An Overview,” IEEE Trans. Knowledge and Data Eng., vol. 4, no. 6, pp. 509-516, Dec. 1992.
[19] Times Ten Team, “In-Memory Data Management for Consumer Transactions the Times-Ten Approach,” ACM SIGMOD Record, vol. 28, no. 2, pp. 528-529, June 1999.
[20] Sybase Corp., “Adaptive Server IQ,” Whitepaper, July 1996.
[21] Compaq Corp., “Infocharger,” Whitepaper, Jan. 1998.
[22] M. Kersten, A.P.J.M. Siebes, M. Holsheimer, and F. Kwakkel, “Research and Business Challenges in Data Mining Technology,” Proc. Datenbanken in Büro, Technik und Wissenschaft, pp. 1-16, Mar. 1997.
[23] D.J. DeWitt, R.H. Katz, F. Olken, L.D. Shapiro, and M.R. Stonebraker, “Implementation Techniques for Main Memory Database Systems,” Proc. ACM SIGMOD, 1984.
[24] S. Listgarten and M.-A. Neimat, “Modelling Costs for a MM-DBMS,” Proc. Int'l Workshop Real-Time Databases, Issues, and Applications, pp. 72-78, Mar. 1996.
[25] K.-Y. Whang and R. Krishnamurthy, “Query Optimization in a Memory-Resident Domain Relational Calculus Database System,” ACM Trans. Database Systems, vol. 15, no. 1, pp. 67-95, Mar. 1990.
[26] R. Berrendorf and H. Ziegler, “PCL—The Performance Counter Library,” Techical Report FZJ-ZAM-IB-9816, ZAM, Forschungzentrum Jülich, Germany, 1998.
[27] M. Zagha, B. Larson, S. Turner, and M. Itzkowitz, "Performance Analysis Using the MIPS R10000 Performance Counters," Proc. Supercomputing '96,Pittsburgh, Pa., Nov. 1996.
[28] K.C. Yeager, “The MIPS R10000 Superscalar Microprocessor,” IEEE Micro, vol. 16, no. 2, pp. 28–40, Apr. 1996.
[29] G.P. Copeland and S. Khoshafian, “A Decomposition Storage Model,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 268-279, May 1985.
[30] Rambus Technologies, Inc., Direct Rambus Technology Disclosure, 1996,
[31] D. Patterson, T. Anderson, N. Cardwell, R. Fromm, K. Keeton, C. Kozyrakis, R. Thomas, and K. Yelick, A Case for Intelligent DRAM: IRAM IEEE Micro, Apr. 1997.
[32] S. McKee et al., "Smarter Memory: Improving Bandwidth for Streamed References," Computer, July 1998, pp. 54-63.
[33] Sematech, National Roadmap For Semiconductor Technology: Technology Needs, 1997,
[34] D.I. August, D.A. Connors, S.A. Mahlke, J.W. Sias, K.M. Crozier, B. Cheng, P.R. Eaton, Q.B. Olaniran, and W.W. Hwu, “Integrated Predicated and Speculative Execution in the IMPACT EPIC Architecture,” Proc. 25th Int'l Symp. Computer Architecture, pp. 227-237, June 1998.
[35] P. Valduriez, “Join Indices,” ACM Trans. Database Systems, vol. 12, no. 2, 1987.
[36] D.S. Batory, "On Searching Transposed Files," ACM Trans. Database Systems, vol. 4, no. 4, pp. 531-544, Dec. 1979.
[37] J. Clear, D. Dunn, B. Harvey, M. Heytens, P. Lohman, A. Mehta, M. Melton, H. Richardson, L. Rohrberg, A. Savasere, R. Wehrmeister, and M. Xu, “NonStopSQL/MX,” Proc. Int'l Conf. Knowledge Discovery and Data Mining, Aug. 1999.
[38] P. Boncz, S. Manegold, and M. Kersten, “Database Architecture Optimized for the New Bottleneck: Memory Access,” Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 54-65, Sept. 1999.
[39] M. Ronström, “Design and Modeling of a Parallel Data Server for Telecom Applications,” PhD thesis, Linköping Univ., 1998.
[40] C. Nyberg, T. Barclay, Z. Cvetanovic, J. Gray, and D. Lomet, “AlphaSort: A RISC Machine Sort,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 233-242, May 1994.

Index Terms:
Main-memory databases, query processing, memory access optimization, decomposed storage model, join algorithms, implementation techniques.
Stefan Manegold, Peter Boncz, Martin Kersten, "Optimizing Main-Memory Join on Modern Hardware," IEEE Transactions on Knowledge and Data Engineering, vol. 14, no. 4, pp. 709-730, July-Aug. 2002, doi:10.1109/TKDE.2002.1019210
Usage of this product signifies your acceptance of the Terms of Use.