Issue No.04 - April (2006 vol.55)
Michel Dubois , IEEE
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TC.2006.50
Cache replacement algorithms originally developed in the context of uniprocessors executing one instruction at a time implicitly assume that all cache misses have the same cost. However, in modern systems, some cache misses are more expensive than others. The cost may be latency, penalty, power consumption, bandwidth consumption, or any other ad hoc numerical property attached to a miss. We call the class of replacement algorithms designed to minimize a nonuniform miss cost function "cost-sensitive replacement algorithms.” In this paper, we first introduce and analyze an optimum cost-sensitive replacement algorithm (CSOPT) in the context of multiple nonuniform miss costs. CSOPT can significantly improve the cost function over OPT (the replacement algorithm minimizing miss count) in large regions of the design space. Although CSOPT is an offline and unrealizable replacement policy, it serves as a lower bound on the achievable cost by realistic cost-sensitive replacement algorithms. Using the practical example of latency cost in CC-NUMA multiprocessors, we demonstrate that there is a lot of room left to improve current replacement algorithms in many situations beyond the promise of OPT. Next, we introduce three practical extensions of LRU inspired by CSOPT and we compare their performance to LRU, OPT, and CSOPT. Finally, as a practical application, we evaluate these realizable cost-sensitive replacement algorithms in the context of the second-level caches of a CC-NUMA multiprocessor with superscalar processors, using the miss latency as the cost function. By applying simple replacement policies sensitive to the latency of misses, we can improve the execution time of some parallel applications by up to 18 percent.
Cache, latency, memory system, power, replacement policy, trace-driven simulations.
Jaeheon Jeong, Michel Dubois, "Cache Replacement Algorithms with Nonuniform Miss Costs", IEEE Transactions on Computers, vol.55, no. 4, pp. 353-365, April 2006, doi:10.1109/TC.2006.50