The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.12 - December (2008 vol.57)
pp: 1585-1599
Xiaogang Qiu , NVIDIA Corporation, Santa Clara
Michel Dubois , University of Southern California, Los Angeles
ABSTRACT
To support dynamic address translation in today's microprocessors, the first-level cache is accessed in parallel with a Translation Lookaside Buffer (TLB). However, this current approach faces mounting problems. This paper introduces new ideas to enable the use of virtual addresses in the cache hierarchy. The major idea is the replacement of the on-chip TLB by a Synonym Lookaside Buffer (SLB). The SLB translates synonyms into a primary virtual address, which is a unique identifier resolving all ambiguities due to synonyms in the memory system. We introduce various system configurations with SLBs and discuss all functional issues associated with them. An SLB is much more scalable than a regular TLB. It scales with memory data set sizes, physical memory sizes and number of cores in a multiprocessor. Moreover SLB entry flushes and shootdowns due to physical memory management are eliminated. We show performance data resulting from the simulation of several applications as diverse as scientific computing, database, and JAVA virtual machines. These evaluations target SLB miss rates and flushes as well as the impact of the SLB on cache miss rates. They show that small SLBs of 8-16 entries are sufficient to solve the synonym problem in virtual caches and that their performance overhead is negligible.
INDEX TERMS
Pipeline processors, Cache memories, Virtual memory, Simulation, Primary memory, Shared memory
CITATION
Xiaogang Qiu, Michel Dubois, "The Synonym Lookaside Buffer: A Solution to the Synonym Problem in Virtual Caches", IEEE Transactions on Computers, vol.57, no. 12, pp. 1585-1599, December 2008, doi:10.1109/TC.2008.108
REFERENCES
[1] A. Agarwal, Analysis of Cache Performance for Operating System and Multiprogramming. Kluwer Academic, 1989.
[2] E. Bugnion et al., “Compiler-Directed Page Coloring for Multiprocessors,” Proc. Seventh Conf. Architecture Support for Programming Languages and Operating Systems (APLOS '96), Oct. 1996.
[3] M.J. Bach, The Design of the UNIX Operating System. Prentice Hall, 1986.
[4] M. Cekleov and M. Dubois, “Virtual-Address Caches, Part 1: Problems and Solutions in Uniprocessors,” IEEE Micro, pp. 64-71, Sept./Oct. 1997.
[5] M. Cekleov and M. Dubois, “Virtual-Address Caches, Part 2: Multiprocessor Issues,” IEEE Micro, pp. 69-74, Nov./Dec. 1997.
[6] C. Chao, M. Machey, and B. Sears, “Mach on a Virtually Addressed Cache Architecture,” Proc. First Mach USENIX Workshop, pp. 31-51, Oct. 1991.
[7] J. Chase, H. Levy, and M. Feeley, “Sharing and Protection in a Single-Address-Space Operating System,” ACM Trans. Computer Systems, pp. 271-307, Nov. 1994.
[8] D. Cheriton, G. Slavenburg, and P. Boyle, “Software-Controlled Caches in the VMP Multiprocessor,” Proc. 13th Ann. Int'l Symp. Computer Architecture (ISCA '86), pp. 366-375, 1986.
[9] Y. Chou, L. Spracklen, and S.G. Abraham, “Store Memory-Level Parallelism Optimizations for Commercial Applications,” Proc. 38th Ann. IEEE/ACM Int'l Symp. Microarchitecture (MICRO), 2005.
[10] D.W. Clark and J.S. Emer, “Performance of the VAX-11/780 Translation Buffer: Simulation and Measurement,” ACM Trans. Computer Systems, vol. 3, no. 1, Feb. 1985.
[11] J.R. Goodman, “Coherency for Multiprocessor Virtual Address Caches,” Proc. Second Conf. Architecture Support for Programming Languages and Operating Systems (ASPLOS), 1987.
[12] L. Gwennap, “Alpha 21364 to Ease Memory Bottleneck,” microprocessor report, Oct. 1998.
[13] S.A. Herrod, “Using Complete Machine Simulation to Understand Computer System Behavior,” PhD thesis, Stanford Univ., Feb. 1998.
[14] G. Hinton et al., “The Microarchitecture of the Pentium 4 Processor,” Intel Technology J., pp. 1-12, Q1, 2001.
[15] B. Jacob and T. Mudge, “Software-Managed Address Translation,” Proc. Third Int'l Symp. High Performance Computer Architecture (HPCA '97), Feb. 1997.
[16] R. Kalla, B. Sinharoy, and J. Tendler, “IBM Power5 Chip: A Dual-Core Multithreaded Processor,” IEEE Micro, pp. 41-47, Mar./Apr. 2004.
[17] E.J. Koldinger, J.S. Chase, and S.J. Eggers, “Architecture Support for Single Address Space Operating System,” Proc. Fifth Conf. Architecture Support for Programming Languages and Operating Systems (ASPLOS '92), pp. 175-186, Oct. 1992.
[18] P. Kongetira, K. Aingaran, and K. Olukotun, “Niagara: A 32-Way Multithreaded Sparc Processor,” IEEE Micro, pp. 21-29, Mar./Apr. 2005.
[19] J.P. Laudon and D. Lenoski, “The SGI Origin: A CC-NUMA Highly Scalable Server,” Proc. 24th Ann. Int'l Symp. Computer Architecture (ISCA), 1997.
[20] W. Lynch, “The Interaction of Virtual Memory and Cache Memory,” PhD thesis, Technical Report CSL-TR-93-587, Stanford Univ., 1993.
[21] The PowerPC Architecture: A Specification for a New Family of RISC Processors, C. May, E. Silha, R. Simpson, and H. Warren, eds. Morgan Kaufmann, 1994.
[22] V. Pai, P. Ranganathan, and S. Adve, “RSIM Reference Manual,” Technical Report 9705, Dept. of Electrical and Computer Eng., Rice Univ., Aug. 1997.
[23] I. Park et al., “Reducing Design Complexity of the Load/Store Queue,” Proc. 36th Ann. Int'l Symp. Microarchitectures (MICRO-36 '03), pp. 411-422, 2003.
[24] X. Qiu and M. Dubois, “Towards Virtually-Addressed Memory Hierarchies,” Proc. Seventh Int'l Symp. High Performance Computer Architecture (HPCA '01), pp. 51-62, Jan. 2001.
[25] X. Qiu and M. Dubois, “Tolerating Late Memory Traps in Dynamically-Scheduled Processors,” IEEE Trans. Computers, vol. 53, no. 6, pp. 732-743, June 2004.
[26] X. Qiu and M. Dubois, “Moving Address Translation Closer to Memory in Distributed Shared Memory Multiprocessors,” IEEE Trans. Parallel and Distributed Systems, vol. 16, no. 7, pp. 612-623, July 2005.
[27] T.H. Romer, W.H. Ohlrich, and A.R. Karlin, “Reducing TLB and Memory Overhead Using Online Promotion,” Proc. 22nd Ann. Int'l Symp. Computer Architecture (ISCA '95), pp. 176-187, 1995.
[28] M. Swanson, L. Stoller, and J. Carter, “Increasing TLB Reach Using Superpages Backed by Shadow Memory,” Proc. 25th Ann. Int'l Symp. Computer Architecture (ISCA '98), pp. 204-213, 1998.
[29] M. Talluri, S. Kong, M.D. Hill, and D.A. Patterson, “Tradeoffs in Supporting Two Page Sizes,” Proc. 19th Ann. Int'l Symp. Computer Architecture (ISCA '92), pp. 415-424, May 1992.
[30] P. Teller, “Translation Lookaside Buffer Consistency,” Computer, vol. 23, no. 6, pp. 26-36, June 1990.
[31] M. Tremblay and J.M. O'Connor, “Ultrasparc I: A Four-Issue Processor Supporting Multimedia,” IEEE Micro, pp. 42-50, Apr. 1996.
[32] W.H. Wang, J.-L. Baer, and H.M. Levy, “Organization and Performance of a Two-Level Virtual-Real Cache Hierarchy,” Proc. 16th Ann. Int'l Symp. Computer Architecture (ISCA '89), pp. 140-148, June 1989.
[33] D. Wood, S. Eggers, G. Gibson, M. Hill, and J. Pendleton, “An In-Cache Address Translation Mechanism,” Proc. 13th Ann. Int'l Symp. Computer Architecture (ISCA '86), pp. 358-365, Jan. 1986.
[34] C.E. Wu, Y. Hsu, and Y.-H. Liu, “A Quantitative Evaluation of Cache Types for High-Performance Computer Systems,” IEEE Trans. Computers, vol. 42, no. 10, pp. 1154-1162, Oct. 1993.
27 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool