Issue No.10 - October (2008 vol.57)
pp: 1372-1386
Prateek Pujara , Binghamton University, Binghamton
Aneesh Aggarwal , Binghamton University, Binghamton
Caches are very inefficiently utilized because not all the excess data brought into the cache, to exploit spatial locality, is utilized. Our experiments showed that Level 1 data cache has a utilization of only about 57%. Increasing the efficiency of the cache (by increasing its utilization) can have significant benefits in terms of reducing the cache energy consumption, reducing the bandwidth requirement, and making more cache space available for the useful data. In this paper, we focus on prediction mechanisms to predict the useless data in a cache block (cache noise), so that only the useful data is brought into the cache on a cache miss. The prediction mechanisms consider the words usage history of cache blocks for predicting the useful data. We obtained a predictability of about 95% with a simple last words usage predictor. When applying cache noise prediction to L1 data cache, we observed about 37% improvement in cache utilization, and about 23% and 28% reduction in cache energy consumption and bandwidth requirement, respectively. Cache noise mispredictions increased the miss rate by 0.1% and had almost no impact on instructions per cycle (IPC) count.
Superscalar, dynamically-scheduled, and statically-scheduled implementation, Memory hierarchy
Prateek Pujara, Aneesh Aggarwal, "Cache Noise Prediction", IEEE Transactions on Computers, vol.57, no. 10, pp. 1372-1386, October 2008, doi:10.1109/TC.2008.75
[1] M. Bekerman et al., “Correlated Load Address Predictors,” Proc. 26th Ann. Int'l Symp. Computer Architecture, 1999.
[2] D. Burger et al., “Memory Bandwidth Limitations of Future Microprocessors,” Proc. 23rd Ann. Int'l Symp. Computer Architecture, 1996.
[3] D. Burger and T. Austin, “The SimpleScalar Tool Set, Version 2.0,” Computer Architecture News, 1997.
[4] A. Dhodapkar and J. Smith, “Comparing Program Phase Detection Techniques,” Proc. 36th Ann. Int'l Symp. Microarchitecture, 2003.
[5] C. Dubnicki and T. LeBlanc, “Adjustable Block Size Coherent Caches,” Proc. 19th Ann. Int'l Symp. Computer Architecture, 1992.
[6] J. Gonzalez and A. Gonzalez, “Speculative Execution via Address Prediction and Data Prefetching,” Proc. 11th ACM Int'l Conf. Supercomputing, 1997.
[7] M. Hill and A. Smith, “Experimental Evaluation of On-Chip Microprocessor Cache Memories,” Proc. 11th Ann. Int'l Symp. Computer Architecture, 1984.
[8] A. Huang and J. Shen, “A Limit Study of Local Memory Requirements Using Value Reuse Profiles,” Proc. 28th Ann. Int'l Symp. Microarchitecture, 1995.
[9] K. Inoue et al., “Dynamically Variable Line-Size Cache Exploiting High On-Chip Memory Bandwidth of Merged DRAM/Logic LSIs,” Proc. Fifth Int'l Symp. High-Performance Computer Architecture, 1999.
[10] T. Johnson and W. Hwu, “Run-Time Adaptive Cache Hierarchy Management via Reference Analysis,” Proc. 24th Ann. Int'l Symp. Computer Architecture, 1997.
[11] T. Johnson et al., “Run-Time Spatial Locality Detection and Optimization,” Proc. 30th Ann. Int'l Symp. Microarchitecture, 1997.
[12] S. Kumar and C. Wilkerson, “Exploiting Spatial Locality in Data Caches Using Spatial Footprints,” Proc. 25th Ann. Int'l Symp. Computer Architecture, 1998.
[13] G. McNiven and E. Davidson, “Analysis of Memory Referencing Behavior for Design of Local Memories,” Proc. 15th Ann. Int'l Symp. Computer Architecture, 1988.
[14] D. Nicolaescu et al., “Compiler-Directed Cache Line Size Adaptivity,” Proc. Second Int'l Workshop Intelligent Memory Systems, 2000.
[15] S. Przybylski, “The Performance Impact of Block Sizes and Fetch Strategies,” Proc. 20th Ann. Int'l Symp. Computer Architecture, 1993.
[16] A. Seznec, “Decoupled Sectored Caches: Conciliating Low Tag Implementation Cost and Low Miss Ratio,” Proc. 21st Ann. Int'l Symp. Computer Architecture, 1994.
[17] A. Smith, “Line(Block) Size Choice for CPU Cache Memories,” ACM Trans. Computer Systems, vol. C-36, pp. 1063-1075, 1987.
[18] A. Veidenbaum et al., “Adapting Cache Line Size to Application Behavior,” Proc. 13th ACM Int'l Conf. Supercomputing, 1999.
[19] E. Witchel and K. Asanovic, “The Span Cache: Software Controlled Tag Checks and Cache Line Size,” Proc. 28th Ann. Int'l Symp. Computer Architecture, 2001.
[20] C. Zhang et al., “Energy Benefits of a Configurable Line Size Cache for Embedded Systems,” Proc. IEEE CS Ann. Symp. VLSI, 2003.
[21] C.F. Chen et al., “Accurate and Complexity-Effective Spatial Pattern Prediction,” Proc. 10th Int'l Symp. High-Performance Computer Architecture, 2004.
[22] P. Shivkumar et al., “CACTI 3.0: An Integrated Cache Timing, Power, and Area Model,” technical report, DEC Western Lab, 2002.