This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Improving Performance of Large Physically Indexed Caches by Decoupling Memory Addresses from Cache Addresses
November 2001 (vol. 50 no. 11)
pp. 1191-1201

Abstract—Modern CPUs often use large physically indexed caches that are direct-mapped or have low associativities. Such caches do not interact well with virtual memory systems. An improperly placed physical page will end up in a wrong place in the cache, causing excessive conflicts with other cached pages. Page coloring has been proposed to reduce the conflict misses by carefully placing pages in the physical memory. While page coloring works well for some applications, many factors limit its performance. Page coloring limits the freedom of the page placement system and may increase swapping traffic. In this paper, we propose a novel and simple architecture, called color-indexed, physically tagged caches, which can significantly reduce the conflict misses. With some simple modifications to the TLB (Translation Look-aside Buffer), the new architecture decouples the addresses of the cache from the addresses of the main memory. Since the cache addresses do not depend on the the physical memory addresses anymore, the system can freely place data in any cache page to minimize the conflict misses, without affecting the paging system. Extensive trace-driven simulation results show that our design performs much better than traditional page coloring techniques. The new scheme enables a direct-mapped cache to achieve hit ratios very close to or better than those of a two-way set associative cache. Moreover, the architecture does not increase cache access latency, which is a drawback of set associative caches. The hardware overhead is minimal. We show that our scheme can reduce the cache size by 50 percent without sacrificing performance. A two-way set-associative cache that uses this strategy can perform very close to a fully associative cache.

[1] M.D. Hill, "A Case for Direct-Mapped Caches," Computer, pp. 25-40, Dec. 1988.
[2] D. Culler, J.P. Singh, and A. Gupta, Parallel Computer Architecture: A Hardware/Software Approach, Morgan Kaufmann, San Francisco, 1998.
[3] R. Kessler and M. Hill, ``Page Placement Algorithm for Large Real-Indexed Caches,'' ACM Trans. Computer Systems, vol. 10, no. 4, pp. 338-359, Nov. 1992.
[4] W.L. Lynch, “The Interaction of Virtual Memory and Cache Memory,” PhD thesis, Computer Systems Laboratory, Stanford Univ., Nov. 1993.
[5] G. Taylor,P. Davies,, and M. Farmwald,"The TLB Slice--A Low-Cost High-Speed Address Translation Mechanism," Proc. 17th Ann. Symp. Computer Architecture, IEEE Computer Society Press,Los Alamitos, Calif., May 1990, pp. 355-363.
[6] T. Chiueh and R. Katz,"Beating the Address Translation Bottleneck," Proc. Fifth Conf. Architectural Support for Programming Languages and Operating Systems, ACM, Oct. 1992, pp. 137-148.
[7] A.L. Hosking and J.E.B. Moss, “Protection Traps and Alternatives for Memory Management of an Object-Oriented Language,” Proc. 14th Symp. Operating Systems Principles, B. Liskov, ed., pp. 106-119, Dec. 1993.
[8] B.K. Bray, W.L. Lynch, and M. Flynn, “Page Allocation to Reduce Access Time of Physical Caches,” Technical Report CSL-TR-90-454, Computer Systems Laboratory, Stanford Univ., 1990.
[9] G. Kane and J. Heinrich, MIPS RISC Architecture, Prentice-Hall, Englewood Cliffs, N.J., 1992.
[10] R. Sites, Alpha Architecture Reference Manual. Digital Press, 1992.
[11] P.M. Chen, L.T. Ng, S. Chandra, C. Aycock, G. Rajamani, and D. Lowell, “The Rio File Cache: Surviving Operating System Crashes,” Proc. 1996 Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 74-83, Oct. 1996.
[12] W.-H. Wang, J.-L. Baer, and H.M. Levy, “Organization and Performance of a Two-Level Virtual-Real Cache Hierarchy,” Proc. 16th Ann. Int'l Symp. Computer Architecture (ISCA '89), pp. 140-148, June 1989.
[13] M. Cekleov and M. Dubois, “Virtual-Address Caches Part 2: Multiprocessor Issues,” IEEE Micro, vol. 17, Nov./Dec. 1997.
[14] D. Black et al., "Translation Lookaside Buffer Consistency: A Software Approach," Proc. Third Conf. Architectural Support for Programming Languages and Operating Systems, 1989, pp. 113-112.
[15] P. Ranganathan, S. Adve, and N.P. Jouppi, “Reconfigurable Caches and Their Application to Media Processing,” Proc. 27th Ann. Int'l Symp. Computer Architecture, pp. 214-224, June 2000.
[16] W.L. Lynch, B.K. Bray, and M.J. Flynn, “The Effect of Page Allocation on Caches,” Proc. 25th Ann. Int'l Symp. Microarchitecture, pp. 222-225, Dec. 1992.
[17] R.L. Sites and A. Agarwal, "Multiprocessor Cache Analysis Using ATUM," Proc. 15th Ann. Int'l Symp. Computer Architecture, IEEE CS Press, 1988, pp. 186-195.
[18] B. Bershad, D. Lee, T. Romer,, and J. Chen, ``Avoiding Conflict Misses Dynamically in Large Direct-Mapped Caches,'' Proc. Sixth ASPLOS, pp. 158-170, Oct. 1994.
[19] T.H. Romer, D. Lee, and B.N. Bershad, “Dynamic Page Mapping Policies for Cache Conflict Resolution on Standard Hardware,” Proc. First Symp. Operating Systems Design and Implementation, pp. 255-266, 1994.
[20] N.P. Jouppi, “Improving Direct-Mapped Cache Performance by the Addition of a Small Fully Associative Cache and Prefetch Buffers,” Proc. 17th Int'l Symp. Computer Architecture, pp. 364-373, May 1990.
[21] A. Agarwal and S.D. Pudar, "Column-Associative Caches: a Technique for Reducing the Miss Rate of Direct-Mapped Caches," Proc. 20th Ann. Int'l Symp. Computer Architecture, IEEE CS Press, 1993, pp. 179-190.
[22] A. Agarwal, J. Hennessy, and M. Horwitz, “Cache Performance of Operating System and Multiprogramming Workloads,” ACM Trans. Computer Systems, vol. 6, no. 4, pp. 393-431, Nov. 1988.
[23] C. Zhang, X. Zhang, and Y. Yan, “Multi-Column Implementations for Cache Associativity,” Proc. Int'l Conf. Computer Design: VLSI in Computers and Processors (ICCD '97), pp. 504-509, Oct. 1997.
[24] J. Hennessy and D. Patterson, Computer Architecture: A Quantitative Approach. Morgan Kaufmann, 1995.
[25] N. Topham and A. González, “Randomized Cache Placement for Eliminating Conflicts,” IEEE Trans. Computers, vol. 48, no. 2, Feb. 1999.
[26] A.J. Smith, “A Comparative Study of Set Associative Memory Mapping Algorithms and Their Use for Cache and Main Memory,” IEEE Trans. Software Eng., vol. 4, pp. 121-130, Mar. 1978.
[27] N. Topham, A. González, and J. González, “The Design and Performance of a Conflict-Avoiding Cache,” Proc. 30th Ann. Int'l Symp. Microarchitecture, pp. 71-80, Dec. 1997.
[28] A.J. Smith, "Cache Memories," ACM Computing Surveys, Vol. 14, 1982, pp. 473-540.
[29] A. Seznec, A Case for Two-Way Skewed-Associative Caches Proc. 20th Int'l Symp. Computer Architecture, pp. 169-178, 1993.

Index Terms:
Novel memory architectures, cache, TLB, memory systems, performance enhancement.
Citation:
Rui Min, Yiming Hu, "Improving Performance of Large Physically Indexed Caches by Decoupling Memory Addresses from Cache Addresses," IEEE Transactions on Computers, vol. 50, no. 11, pp. 1191-1201, Nov. 2001, doi:10.1109/12.966494
Usage of this product signifies your acceptance of the Terms of Use.