This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Tolerating Late Memory Traps in Dynamically Scheduled Processors
June 2004 (vol. 53 no. 6)
pp. 732-743

Abstract—In the past few years, exception support for memory functions such as virtual memory, informing memory operations, software assist for shared memory protocols, or interactions with processors in memory has been advocated in various research papers. These memory traps may occur on a miss in the cache hierarchy or on a local or remote memory access. However, contemporary, dynamically scheduled processors only support memory exceptions detected in the TLB associated with the first-level cache. They do not support memory exceptions taken deep in the memory hierarchy. In this case, memory traps may be late, in the sense that the exception condition may still be undecided when a long-latency memory instruction reaches the retirement stage. In this paper we evaluate through simulation the overhead of memory traps in dynamically scheduled processors, focusing on the added overhead incurred when a memory trap is late. We also propose some simple mechanisms to reduce this added overhead while preserving the memory consistency model. With more aggressive memory access mechanisms in the processor we observe that the overhead of all memory traps—either early or late—is increased while the lateness of a trap becomes largely tolerated so that the performance gap between early and late memory traps is greatly reduced. Additionally, because of caching effects in the memory hierarchy, the frequency of memory traps usually decreases as they are taken deeper in the memory hierarchy and their overall impact on execution times becomes negligible. We conclude that support for memory traps taken throughout the memory hierarchy could be added to dynamically scheduled processors at low hardware cost and little performance degradation.

[1] A. Appel and K. Li, Virtual Memory Primitives for User Programs Proc. Fourth Conf. Architecture Support for Programming Languages and Operating Systems (ASPLOS), pp. 96-107, 1991.
[2] D. Callahan, K. Kennedy, and A. Porterfield, Software Prefetching Proc. Fourth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 40-52, Apr. 1991.
[3] M. Cekleov and M. Dubois, “Virtual-Address Caches. Part 1: Problems and Solutions in Uniprocessors,” IEEE Micro, vol. 17, no. 5, pp. 64–71, Sept./Oct. 1997.
[4] M. Cekleov and M. Dubois, “Virtual-Address Caches Part 2: Multiprocessor Issues,” IEEE Micro, vol. 17, Nov./Dec. 1997.
[5] D. Chaiken and A. Agarwal, "Software-Extended Coherent Shared Memory—Performance and Cost," Twenty-First Annual Int'l Symp. Computer Arch., (ISCA 21), ACM, April 1994.
[6] R. Chappel, J. Stark, S. Kim, and Y. Patt, Simultaneous Subordinate Microthreading (SSMT) Proc. 26th Ann. Int'l Symp. Computer Architecture (ISCA), May 1999.
[7] D. Cheriton, G. Slavenburg, and P. Boyle, Software-Controlled Caches in the VMP Multiprocessor Proc. 13th Ann. Int'l Symp. Computer Architecture (ISCA), pp. 366-375, 1986.
[8] M. Dubois, C. Scheurich, and F. Briggs, Memory Access Buffering in Multiprocessors Proc. 13th Ann. Int'l Symp. Computer Architecture (ISCA), pp. 320-328, 1986.
[9] K. Gharachorloo, A. Gupta, and J. Hennessy, Performance Evaluation of Memory Consistency Models for Shared-Memory Multiprocessors Proc. Fourth Conf. Architecture Support for Programming Languages and Operating Systems (ASPLOS), pp. 245-257, 1991.
[10] K. Gharachorloo, A. Gupta, and J. Hennessy, Two Techniques to Enhance the Performance of Memory Consistency Models Proc. Int'l Conf. Parallel Processing, pp. I355-I364, 1991.
[11] C. Gniady, B. Falsafi, and T.N. Vijaykumar, Is SC + ILP = RC? Proc. 26th Ann. Int'l Symp. Computer Architecture, pp. 162-171, May 1999.
[12] H. Grahn and P. Stenstrom, Efficient Strategies for Software-Only Directory Protocols in Shared-Memory Multiprocessors Proc. 22nd Ann. Int'l Symp. Computer Architecture (ISCA), pp. 38-47, 1995.
[13] E. Hagersten and M. Koster, "WildFire: A Scalable Path for SMPs," Proc. 5th Int'l Symp. High-Performance Computer Architecture, IEEE CS Press, Los Alamitos, Calif., 1999, pp. 172-181.
[14] M. Hill, J. Larus, S. Reinhardt, and D. Wood, Cooperative Shared Memory: Software and Hardware for Scalable Multiprocessors ACM Trans. Computer Systems, vol. 11, no. 4, pp. 300-318, Nov. 1993.
[15] M. Horowitz, M. Martonosi, T. Mowry, and M. Smith, Informing Memory Operations: Providing Memory Performance Feedback in Modern Processors Proc. 23rd Ann. Int'l Symp. Computer Architecture (ISCA), pp. 260-270, 1996.
[16] B. Jacob and T. Mudge, “Software-Managed Address Translation,” Proc. Third Int'l Symp. High Performance Computer Architecture, pp. 156–167, Feb. 1997.
[17] B. Jacob and T. Mudge, A Look at Several Memory Management Units, TLB-Refill Mechanisms, and Page Table Organizations Proc. Eighth Conf. Architecture Support for Programming Languages and Operating Systems (ASPLOS), 1998.
[18] K. Li and P. Hudak, Memory Coherence in Shared Virtual Memory Systems ACM Trans. Computer Systems, vol. 7, no. 4, pp. 321-359, Nov. 1989.
[19] A. Moga, A. Gefflaut, and M. Dubois, Hardware vs. Software Implementation of COMA Proc. 1997 Int'l Conf. Parallel Processing, pp. 248-256, Aug. 1997.
[20] D. Nagle, R. Uhlig, T. Stanley, S. Sechrest, T. Mudge, and R. Brown, “Design Tradeoffs for Software-Managed TLBs,” Proc. 20th Ann. Int'l Symp. Computer Architecture (ISCA '93), May 1993.
[21] V. Pai, P. Ranganathan, and S. Adve, RSIM Reference Manual Technical Report 9705, Dept. of Electrical and Computer Eng., Rice Univ., Aug. 1997.
[22] V. Pai, P. Ranganathan, S. Adve, and T. Harton, An Evaluation of Memory Consistency Models for Shared-Memory Systems with ILP Processors Proc. Seventh Conf. Architecture Support for Programming Languages and Operating Systems (ASPLOS), pp. 12-23, Oct. 1996.
[23] X. Qiu and M. Dubois, Options for Dynamic Address Translation for COMAs Proc. 25th Ann. Int'l Symp. Computer Architecture (ISCA), pp. 214-225, 1998.
[24] X. Qiu and M. Dubois, Tolerating Late Memory Traps for ILP Processors Proc. 26th Ann. Int'l Symp. Computer Architecture (ISCA), pp. 76-87, 1999.
[25] X. Qiu and M. Dubois, Towards Virtually-Addressed Memory Hierarchies Proc. Seventh Int'l Symp. High Performance Computer Architecture (HPCA), pp. 51-62, Jan. 2001.
[26] P. Ranganathan, V. Pai, and S. Adve, Using Speculative Retirement and Larger Instruction Windows to Narrow the Performance Gap between Memory Consistency Models Proc. Ninth ACM Symp. Parallel Algorithms and Architectures (SPAA), June 1997.
[27] S.K. Reinhardt, J.R. Larus, and D.A. Wood, “Tempest and Typhoon: User-Level Shared Memory,” Proc. 21st Int'l Symp. Computer Architecture, pp. 325-337, Apr. 1994.
[28] I. Schoinas, B. Falsafi, A. Lebeck, S. Reinhardt, J. Larus, and D. Wood, Fine-Grain Access Control for Distributed Shared Memory Proc. Sixth Conf. Architecture Support for Programming Languages and Operating Systems (ASPLOS), pp. 297-306, 1994.
[29] J. Smith and A. Pleszkun, Implementation of Precise Interrupt in Pipelined Processors Proc. 12th Ann. Int'l Symp. Computer Architecture (ISCA), pp. 36-44, 1985.
[30] J. Smith and G. Sohi, "The Microarchitecture of Superscalar Processors," Proc. IEEE, vol. 83, 1995, pp. 1609-1624.
[31] Y.H. Song and M. Dubois, Assisted Execution Technical Report #CENG 98-25, Dept. of EE-Systems, Univ. of Southern California, Oct. 1998.
[32] M. Swanson, L. Stroller, and J.B. Carter, Increasing TLB Reach Using Superpages Backed by Shadow Memory Proc. 25th Ann. Int'l Symp. Computer Architecture (ISCA '98), June 1998.
[33] M. Talluri, S. Kong, M.D. Hill, and D.A. Patterson, “Tradeoffs in Supporting Two Page Sizes,” Proc. 19th Ann. Int'l Symp. Computer Architecture (ISCA '92), pp. 415-424, May 1992.
[34] P. Teller and A. Gottlieb, Locating Multiprocessor TLBs at Memory Proc. 27th Ann. Hawaii Int'l Conf. System Science, pp. 554-563, 1994.
[35] D. Weaver and T. Germond, The SPARC Architecture Manual, version 9. Prentice Hall, 1994.
[36] S. Woo, M. Ohara, E. Torrie, J.P. Singh, and A. Gupta, “The SPLASH-2 Programs: Characterization and Methodological Considerations,” Proc. Int'l Symp. Computer Architecture, pp. 24-36, June 1995.
[37] K.C. Yeager, “The MIPS R10000 Superscalar Microprocessor,” IEEE Micro, vol. 16, no. 2, pp. 28–40, Apr. 1996.
[38] D. Yeung, J. Kubiatowicz, and A. Agarwal, MGS: A Multigrain Shared Memory System Proc. 23rd Ann. Int'l Symp. Computer Architecture (ISCA), pp. 44-55, 1996.
[39] C. Zilles, J. Emer, and G. Sohi, The Use of Multithreading for Exception Handling Proc. 32nd Ann. Int'l Symp. Microarchitecture (Micro-32), 1999.

Index Terms:
Microarchitecture, memory system, exception, trap, simulations, instruction-level parallelism, memory consistency model, prefetching.
Citation:
Xiaogang Qiu, Michel Dubois, "Tolerating Late Memory Traps in Dynamically Scheduled Processors," IEEE Transactions on Computers, vol. 53, no. 6, pp. 732-743, June 2004, doi:10.1109/TC.2004.18
Usage of this product signifies your acceptance of the Terms of Use.