This Article 
 Bibliographic References 
 Add to: 
Cache Replacement Algorithms with Nonuniform Miss Costs
April 2006 (vol. 55 no. 4)
pp. 353-365
Cache replacement algorithms originally developed in the context of uniprocessors executing one instruction at a time implicitly assume that all cache misses have the same cost. However, in modern systems, some cache misses are more expensive than others. The cost may be latency, penalty, power consumption, bandwidth consumption, or any other ad hoc numerical property attached to a miss. We call the class of replacement algorithms designed to minimize a nonuniform miss cost function "cost-sensitive replacement algorithms.” In this paper, we first introduce and analyze an optimum cost-sensitive replacement algorithm (CSOPT) in the context of multiple nonuniform miss costs. CSOPT can significantly improve the cost function over OPT (the replacement algorithm minimizing miss count) in large regions of the design space. Although CSOPT is an offline and unrealizable replacement policy, it serves as a lower bound on the achievable cost by realistic cost-sensitive replacement algorithms. Using the practical example of latency cost in CC-NUMA multiprocessors, we demonstrate that there is a lot of room left to improve current replacement algorithms in many situations beyond the promise of OPT. Next, we introduce three practical extensions of LRU inspired by CSOPT and we compare their performance to LRU, OPT, and CSOPT. Finally, as a practical application, we evaluate these realizable cost-sensitive replacement algorithms in the context of the second-level caches of a CC-NUMA multiprocessor with superscalar processors, using the miss latency as the cost function. By applying simple replacement policies sensitive to the latency of misses, we can improve the execution time of some parallel applications by up to 18 percent.

[1] S. Albers, S. Arora, and S. Khanna, “Page Replacement for General Caching Problems,” Proc. 10th Ann. ACM-SIAM Symp. Discrete Algorithms, Jan. 1999.
[2] L. Belady, “A Study of Replacement Algorithms for a Virtual-Storage Computer,” IBM Systems J., vol. 5, no. 2, pp. 78-101, 1966.
[3] P. Cao and S. Irani, “Cost-Aware WWW Proxy Caching Algorithms,” Proc. 1997 USENIX Symp. Internet Technology and Systems, pp. 193-206, Dec. 1997.
[4] J. Chame and M. Dubois, “Cache Inclusion and Processor Sampling in Multiprocessor Simulations,” Proc. ACM Sigmetrics, pp. 36-47, May 1993.
[5] D. Culler, J.P. Singh, and A. Gupta, Parallel Computer Architecture. Morgan Kaufmann, 1999.
[6] M.D. Hill and A.J. Smith, “Evaluating Associativity in CPU Caches,” IEEE Trans. Computers, vol. 38, no. 12, pp. 1612-30, Dec. 1989.
[7] J. Jeong and M. Dubois, “Optimal Replacements in Caches with Two Miss Costs,” Proc. 11th ACM Symp. Parallel Algorithms and Architectures, pp. 155-164, June 1999.
[8] J. Jeong, “Cost-Sensitive Cache Replacement Algorithms,” PhD thesis, Dept. of Electrical Eng.-Systems, Univ. of Southern California, May 2002.
[9] J. Jeong and M. Dubois, “Cost-Sensitive Cache Replacement Algorithms,” Proc. Ninth Int'l Symp. High-Performance Computer Architecture), Feb. 2003.
[10] A. Karlin, M. Manasse, L. Rudolph, and D. Sleator, “Competitive Snoopy Caching,” Proc. 27th Ann. IEEE Symp. Foundations of Computer Science, 1986.
[11] R.L. Mattson, J. Gecsei, D.R. Slutz, and I.L. Traiger, “Evaluation Techniques for Storage Hierarchies,” IBM Systems J., vol. 9, pp. 77-117, 1970.
[12] A. Moga and M. Dubois, “The Effectiveness of SRAM Network Caches in Clustered DSMs,” Proc. Fourth Int'l Symp. High Performance Computer Architecture, pp. 103-112, Feb. 1998.
[13] V. Pai, P. Ranganathan, and S. Adve, “RSIM Reference Manual,” Technical Report 9705, Dept. of Electrical and Computer Eng., Rice Univ., Aug. 1997.
[14] A.J. Smith, “Cache Memories,” ACM Computing Surveys, vol. 3, pp. 473-530, Sept. 1982.
[15] K. So and R. Rechtschaffen, “Cache Operations by MRU Change,” IEEE Trans. Computers, vol. 37, no. 6, pp. 700-709, June 1988.
[16] S.T. Srivivasan, R.D. Ju, A.R. Lebeck, and C. Wilkerson, “Locality vs. Criticality,” Proc. 28th Int'l Symp. Computer Architecture, pp. 132-143, July 2001.
[17] H.S. Stone, High-Performance Computer Architecture, third ed. Addison-Wesley, 1993.
[18] W. Wang and J.L. Baer, “Efficient Trace-Driven Simulation Methods for Cache Performance Analysis,” ACM Trans. Computer Systems, vol. 9, no. 3, pp. 222-241, Aug. 1991.
[19] S. Woo, M. Ohara, E. Torrie, J. Singh, and A. Gupta, “The SPLASH-2 Programs: Characterization and Methodological Considerations,” Proc. 22nd Int'l Symp. Computer Architecture, pp. 24-36, June 1995.
[20] N. Young, “The k-Server Dual and Loose Competitiveness for Paging,” Algorithmica, vol. 11, no. 6, pp. 525-541, June 1994.
[21] J. Jeong, P. Stenström, and M. Dubois, “Simple Penalty-Sensitive Replacement Policies for Caches,” Proc. 2006 ACM Int'l Conf. Computing Frontiers, May 2006.

Index Terms:
Cache, latency, memory system, power, replacement policy, trace-driven simulations.
Jaeheon Jeong, Michel Dubois, "Cache Replacement Algorithms with Nonuniform Miss Costs," IEEE Transactions on Computers, vol. 55, no. 4, pp. 353-365, April 2006, doi:10.1109/TC.2006.50
Usage of this product signifies your acceptance of the Terms of Use.