The Community for Technology Leaders
RSS Icon
Issue No.04 - April (2008 vol.57)
pp: 433-447
Recent studies have shown that in highly associative caches, the performance gap between the Least Recently Used (LRU) and the theoretical optimal replacement algorithms is large, motivating the design of alternative replacement algorithms to improve cache performance. In LRU replacement, a line, after its last use, remains in the cache for a long time until it becomes the LRU line. Such dead lines unnecessarily reduce the cache capacity available for other lines. In addition, in multi-level caches, temporal reuse patterns are often inverted, showing in the L1 cache, but due to the filtering effect of the L1 cache, not showing in the L2 cache. At the L2, these lines appear to be brought in the cache but are never used until they are replaced. These lines unnecessarily pollute the L2 cache. This paper proposes a new counter-based approach to deal with the problems. For the former problem, we predict lines that have become dead, and replace them early from the L2 cache. For the latter problem, we identify never-used lines, bypass the L2 cache, and directly place them in the L1 cache. Both techniques are achieved through a single counter-based mechanism. In our approach, each line in the L2 cache is augmented with an event counter that is incremented when an event of interest, such as certain cache accesses, occurs. When the counter reaches a threshold, the line "expires", and becomes replaceable. Each line's threshold is unique and is dynamically learned. We propose and evaluate two new replacement algorithms: Access Interval Predictor (AIP) and Live-time Predictor (LvP). AIP and LvP speed up 10 capacity-constrained SPEC2000 benchmarks by up to 40%, and 11% on average. Cache bypassing further reduce L2 cache pollution, and improve the average speedups to 13-14%.
Cache memories, Cache Replacement, Cache Bypassing, Counter-Based Algorithms, Cache Misses
Mazen Kharbutli, "Counter-Based Cache Replacement and Bypassing Algorithms", IEEE Transactions on Computers, vol.57, no. 4, pp. 433-447, April 2008, doi:10.1109/TC.2007.70816
[1] W. Wong and J.-L. Baer, “Modified LRU Policies for Improving Second-Level Cache Behavior,” Proc. Sixth Int'l Symp. High-Performance Computer Architecture, 2000.
[2] W.-F. Lin and S. Reinhardt, “Predicting Last-Touch References under Optimal Replacement,” Technical Report CSE-TR-447-02, Univ. of Michigan, 2002.
[3] A.-C. Lai, C. Fide, and B. Falsafi, “Dead-Block Prediction and Dead-Block Correlating Prefetchers,” Proc. 28th Int'l Symp. Computer Architecture, 2001.
[4] J. Abella, A. Gonzalez, X. Vera, and M. O'Boyle, “IATAC: A Smart Predictor to Turn-Off L2 Cache Lines,” ACM Trans. Architecture and Code Optimization, 2005.
[5] S. Kaxiras, Z. Hu, and M. Martonosi, “Cache Decay: Exploiting Generational Behavior to Reduce Cache Leakage Power,” Proc. 28th Int'l Symp. Computer Architecture, 2001.
[6] G. Chen, V. Narayanan, M. Kandemir, M. Irwin, and M. Wolczko, “Tracking Object Life Cycle for Leakage Energy Optimization,” Proc. ISSS/CODES Joint Conf., 2003.
[7] H. Zhou, M. Toburen, E. Rotenberg, and T. Conte, “Adaptive Mode Control: A Static-Power-Efficient Cache Design,” ACM Trans. Embedded Computing Systems, 2002.
[8] Z. Hu, S. Kaxiras, and M. Martonosi, “Timekeeping in the Memory System: Predicting and Optimizing Memory Behavior,” Proc. 29th Int'l Symp. Computer Architecture, 2002.
[9] M. Takagi and K. Hiraki, “Inter-Reference Gap Distribution Replacement: An Improved Replacement Algorithm for Set-Associative Caches,” Proc. 18th Int'l Conf. Supercomputing, 2004.
[10] K. Beyls and E. D'hollander, “Compile-Time Cache Hint Generation for EPIC Architectures,” Proc. Second Workshop Explicitly Parallel Instruction Computing Architecture and Compilers, 2002.
[11] Z. Wang, K. McKinley, A. Rosenberg, and C. Weems, “Using the Compiler to Improve Cache Replacement Decisions,” Proc. Int'l Conf. Parallel Architectures and Compilation Techniques, 2002.
[12] J. Jeong and M. Dubois, “Cache Replacement Algorithms with Nonuniform Miss Costs,” IEEE Trans. Computers, vol. 55, no. 4, Apr. 2006.
[13] J. Jeong, P. Stenstrom, and M. Dubois, “Simple, Penalty-Sensitive Replacement Policies for Caches,” Proc. ACM Int'l Conf. Computing Frontiers, May 2006.
[14] C. Chi and H. Dietz, “Improving Cache Performance by Selective Cache Bypass,” Proc. 22nd Ann. Hawaii Int'l Conf. System Sciences, vol. 1, 1989.
[15] Y. Wu, R. Rakvic, L.-L. Chen, C.-C. Miao, G. Chrysos, and J. Fang, “Compiler Managed Micro-Cache Bypassing for High Performance EPIC Processors,” Proc. 35th Ann. ACM/IEEE Int'l Symp. Microarchitecture, 2002.
[16] T. Johnson, D. Connors, M. Merten, and W. Hwu, “Run-Time Cache Bypassing,” IEEE Trans. Computers, vol. 48, no. 12, Dec. 1999.
[17] E. Tam, J. Rivers, V. Srinivasan, G. Tyson, and E. Davidson, “Active Management of Data Caches by Exploiting Reuse Information,” IEEE Trans. Computers, vol. 48, no. 11, Nov. 1999.
[18] T. Johnson and W. Hwu, “Run-Time Adaptive Cache Hierarchy Management via Reference Analysis,” Proc. 24th Int'l Symp. Computer Architecture, 1997.
[19] J. Rivers, E. Tam, G. Tyson, E. Davidson, and M. Farrens, “Utilizing Reuse Information in Data Cache Management,” Proc. 12th Int'l Conf. Supercomputing, 1998.
[20] J. Rivers and E. Davidson, “Reducing Conflicts in Direct-Mapped Caches with Temporality-Based Design,” Proc. Int'l Conf. Parallel Processing, 1996.
[21] V. Milutinovic, B. Markovic, M. Tomasevic, and M. Tremblay, “The Split Temporal/Spatial Cache: Initial Performance Analysis,” Proc. Int'l Workshop SCI-Based High-Performance Low-Cost Computing, 1996.
[22] A. Gonzalez, C. Aliagas, and M. Valero, “A Data Cache with Multiple Caching Strategies Tuned to Different Types of Locality,” Proc. Ninth Int'l Conf. Supercomputing, 1995.
[23] G. Tyson, M. Farrens, J. Matthews, and A. Pleszkun, “A New Approach to Cache Management,” Proc. 28th Int'l Symp. Microarchitecture, 1995.
[24] L. Li, I. Kadayif, Y.-F. Tsai, N. Vijaykrishnan, M. Kandemir, M. Irwin, and A. Sivasubramaniam, “Leakage Energy Management in Cache Hierarchies,” Proc. 11th Int'l Conf. Parallel Architectures and Compilation Techniques, 2002.
[25] H. Dybdahl and P. Stenström, “Enhancing Last-Level Cache Performance by Block Bypassing and Early Miss Determination,” Proc. Asia-Pacific Computer Systems Architecture Conf., 2006.
[26] J.-K. Peir, S.-C. Lai, S.-L. Lu, J. Stark, and K. Lai, “Bloom Filtering Cache Misses for Accurate Data Speculation and Prefetching,” Proc. Int'l Conf. Supercomputing (ICS '02), 2002.
[27] G. Keramidas, P. Xekalakis, and S. Kaxiras, “Applying Decay to Reduce Dynamic Power in Set-Associative Caches,” Proc. Int'l Conf. High Performance Embedded Architectures and Compilers, 2007.
[28] D. Wood, M. Hill, and R. Kessler, “A Model for Estimating Trace-Sample Miss Ratios,” Proc. ACM SIGMETRICS Conf. Measurement and Modeling of Computer Systems, 1991.
[29] Advanced Micro Devices, “AMD Opteron Product Data Sheet,” white_papers _and_tech_docs23932.pdf , June 2004.
[30] D. Culler, J. Singh, and A. Gupta, Parallel Computer Architecture: A Hardware/Software Approach. Morgan Kaufmann, 1999.
[31] V. Krishnan and J. Torrellas, “A Direct-Execution Framework for Fast and Accurate Simulation of Superscalar Processors,” Proc. Int'l Conf. Parallel Architectures and Compilation Techniques, 1998.
[32] L. Belady, “A Study of Replacement Algorithms for Virtual Storage Computers,” IBM Systems J., vol. 5, 1966.
[33] R. Mattson, J. Gecsei, D. Slutz, and I. Traiger, “Evaluation Techniques for Storage Hierarchies,” IBM Systems J., vol. 9, no. 2, 1970.
[34] J. Lee, Y. Solihin, and J. Torrellas, “Automatically Mapping Code on an Intelligent Memory Architecture,” Proc. Seventh Int'l Symp. High-Performance Computer Architecture, 2001.
18 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool