Proceedings.International Conference on Parallel Architectures and Compilation Techniques (2002)
Sept. 22, 2002 to Sept. 25, 2002
Zhenlin Wang , University of Massachusetts at Amherst
Kathryn S. McKinley , University of Texas at Austin
Arnold L. Rosenberg , University of Massachusetts at Amherst
Charles C. Weems , University of Massachusetts at Amherst
Memory performance is increasingly determining microprocessor performance and technology trends are exacerbating this problem. Most architectures use set-associative caches with LRU replacement policies to combine fast access with relatively low miss rates. To improve replacement decisions in set-associative caches, we develop a new set of compiler algorithms that predict which data will and will not be reused and provide these hints to the architecture. We prove that the hints either match or improve hit rates over LRU. We describe a practical one-bit cache-line tag implementation of our algorithm, called evict-me. On a cache replacement, the architecture will replace a line for which the evict-me bit is set, or if none is set, it will use the LRU bits. We implement our compiler analysis and its output in the Scale compiler. On a variety of scientific programs, using the evict-me algorithm in both the level 1 and 2 caches improves simulated cycle times by up to 34% over the LRU policy by increasing hit rates. In addition, a combination of simple hardware prefetching and evict-me works together to further improve performance.
C. C. Weems, K. S. McKinley, A. L. Rosenberg and Z. Wang, "Using the Compiler to Improve Cache Replacement Decisions," Proceedings.International Conference on Parallel Architectures and Compilation Techniques(PACT), Charlottesville, Virginia, 2002, pp. 199.