This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Eliminating Conflict Misses Using Prime Number-Based Cache Indexing
May 2005 (vol. 54 no. 5)
pp. 573-586
Using alternative cache indexing/hashing functions is a popular technique to reduce conflict misses by achieving a more uniform cache access distribution across the sets in the cache. Although various alternative hashing functions have been demonstrated to eliminate the worst-case conflict behavior, no study has really analyzed the pathological behavior of such hashing functions that often results in performance slowdown. In this paper, we present an in-depth analysis of the pathological behavior of cache hashing functions. Based on the analysis, we propose two new hashing functions, prime modulo and odd-multiplier displacement, that are resistant to pathological behavior and yet are able to eliminate the worst-case conflict behavior in the L2 cache. We show that these two schemes can be implemented in fast hardware using a set of narrow addition operations, with negligible fragmentation in the L2 cache. We evaluate the schemes on 23 memory intensive applications. For applications that have nonuniform cache accesses, both prime modulo and odd-multiplier displacement hashing achieve an average speedup of 1.27 compared to traditional hashing, without slowing down any of the 23 benchmarks. We also evaluate using odd-multiplier displacement function with multiple multipliers in conjunction with a skewed associative L2 cache. The skewed associative cache achieves a better average speedup at the cost of some pathological behavior that slows down four applications by up to 7 percent.

[1] A.V. Aho and J.D. Ullman, Principles of Compiler Design, chapter 7.6, pp. 434-438. Addison-Wesley, 1997.
[2] D.F. Bacon, J.-H. Chow, D. Ching, R. Ju, K. Muthukumar, and V. Sarkar, “A Compiler Framework for Restructuring Data Declarations to Enhance Cache and TLB Effectiveness,” Proc. 1994 Conf. Centre for Advanced Studies on Collaborative Research, Oct. 1994.
[3] J.E. Barnes, “Treecode,” Inst. for Astronomy, Univ. of Hawaii, 1994, ftp://hubble.ifa.hawaii.edu/pub/barnestreecode .
[4] F. Bodin and A. Seznec, “Skewed-Associativity Improves Performance and Enhances Predictability,” IEEE Trans. Computers, vol. 46, 1997.
[5] P. Budnick and D.J. Kuck, “Organization and Use of Parallel Memories,” IEEE Trans. Computers, vol. 20, no. 12, Dec. 1971.
[6] J. Dongarra, V. Eijkhout, and H. van der Vorst, “SparseBench: A Sparse Iterative Benchmark,” http://www.netlib.org/benchmarksparsebench , 2005.
[7] J.M. Frailong, W. Jalby, and J. Lenfant, “XOR-Schemes: A Flexible Data Organization in Parallel Memories,” Proc. Int'l Conf. Parallel Processing, 1985.
[8] D.T. Harper III and J.R. Jump, “Vector Access Performance in Parallel Memories Using a Skewed Storage Scheme,” IEEE Trans. Computers, vol. 36, no. 12, Dec. 1987.
[9] M. Kharbutli, K. Irwin, Y. Solihin, and J. Lee, “Using Prime Numbers for Cache Indexing to Eliminate Conflict Misses,” Proc. Int'l Symp. High Performance Computer Architecture, 2004.
[10] V. Krishnan and J. Torrellas, “A Direct-Execution Framework for Fast and Accurate Simulation of Superscalar Processors,” Proc. Int'l Conf. Parallel Architectures and Compilation Techniques, Oct. 1998.
[11] D.H. Lawrie and C.R. Vora, “The Prime Memory System for Array Access,” IEEE Trans. Computers, vol. 31, no. 5, May 1982.
[12] W.F. Lin, S.K. Reinhardt, and D.C. Burger, “Reducing DRAM Latencies with a Highly Integrated Memory Hierarchy Design,” Proc. Int'l Symp. High-Performance Computer Architecture, 2001.
[13] NAS Parallel Benchmark, http://www.nas.nasa.gov/pubs/tech reports/ nasreportsnas-98-009/, 2005.
[14] W.H. Payne, J.R. Rabung, and T.P. Bogyo, “Coding the Lehmer Pseudo-Random Number Generator,” Comm. ACM, 1969.
[15] R. Raghavan and J. Hayes, “On Randomly Interleaved Memories,” Proc. 1990 Conf. Supercomputing, 1990.
[16] B.R. Rau, “Pseudo-Randomly Interleaved Memory,” Proc. 18th Int'l Symp. Computer Architecture, 1991.
[17] B.R. Rau, M. Schlansker, and D. Yen, “The Cydra 5 Stride-Insensitive Memory System,” Proc. Int'l Conf. Parallel Processing, 1989.
[18] A. Seznec, “A Case for Two-Way Skewed Associative Caches,” Proc. 20th Int'l Symp. Computer Architecture, 1993.
[19] A. Seznec, “A New Case for Skewed-Associativity,” IRISA Technical Report #1114, 1997.
[20] G.S. Sohi, “Logical Data Skewing Schemes for Interleaved Memories in Vector Processors,” Univ. of Wisconsin-Madison Computer Science Technical Report #753, 1988.
[21] Standard Performance Evaluation Corp., http:/www.spec.org, 2005.
[22] N. Topham, A. Gonzalez, and J. Gonzalez, “Eliminating Cache Conflict Misses through XOR-Based Placement Functions,” Proc. Int'l Conf. Supercomputing, 1997.
[23] N. Topham, A. Gonzalez, and J. Gonzalez, “The Design and Performance of a Conflict-Avoiding Cache,” Proc. Int'l Symp. Microarchitecture, 1997.
[24] P.-C. Wu, “Multiplicative, Congruential Random Number Generators with Multiplier $\pm 2^{k1} \pm 2^{k2}$ and Modulus $2^{p} - 1$ ,” ACM Trans. Math. Software, 1997.
[25] Q. Yang and L.W. Yang, “A Novel Cache Design for Vector Processing,” Proc. Int'l Symp. Computer Architecture, 1992.
[26] Z. Zhang, Z. Zhu, and X. Zhang, “A Permutation-Based Page Interleaving Scheme to Reduce Row-Buffer Conflicts and Exploit Data Locality,” Proc. Int'l Symp. Microarchitecture, 2000.

Index Terms:
Cache hashing, cache indexing, prime modulo, odd-multiplier displacement, conflict misses.
Citation:
Mazen Kharbutli, Yan Solihin, Jaejin Lee, "Eliminating Conflict Misses Using Prime Number-Based Cache Indexing," IEEE Transactions on Computers, vol. 54, no. 5, pp. 573-586, May 2005, doi:10.1109/TC.2005.79
Usage of this product signifies your acceptance of the Terms of Use.