This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Comprehensive Hardware and Software Support for Operating Systems to Exploit MP Memory Hierarchies
May 1999 (vol. 48 no. 5)
pp. 494-505

Abstract—High-performance multiprocessor workstations are becoming increasingly popular. Since many of the workloads running on these machines are operating-system intensive, we are interested in exploring the types of support for the operating system that the memory hierarchy of these machines should provide. In this paper, we evaluate a comprehensive set of hardware and software supports that minimize the performance losses for the operating system in a sophisticated cache hierarchy. These supports, selected from recent papers, are code layout optimization, guarded sequential instruction prefetching, instruction stream buffers, support for block operations, support for coherence activity, and software data prefetching. We evaluate these supports under a simulated environment. We show that they have a largely complementary impact and that, when combined, speed up the operating system by an average of 40 percent. Finally, a cost-performance comparison of these schemes suggests that the most cost-effective ones are code layout optimization and block operation support, while the least cost-effective one is software data prefetching.

[1] A. Agarwal, J. Hennessy, and M. Horwitz, “Cache Performance of Operating System and Multiprogramming Workloads,” ACM Trans. Computer Systems, vol. 6, no. 4, pp. 393-431, Nov. 1988.
[2] T. Anderson, H. Levy, B. Bershad, and E. Lazowska, “The Interaction of Architecture and Operating System Design,” Proc. Fourth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 108-120, Apr. 1991.
[3] J.B. Andrews, “A Hardware Tracing Facility for a Multiprocessing Supercomputer,” Technical Report 1009, Univ. of Illinois at Urbana-Champaign, Center for Supercomputing Research and Development, May 1990.
[4] J. Archibald and J.L. Baer, "Cache Coherence Protocols: Evaluation Using a Multiprocessor Simulation Model," ACM Trans. Computer Systems, vol. 4, no. 4, Nov. 1986.
[5] M. Berry et al., “The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers,” Int'l J. Supercomputer Applications, vol. 3, no. 3, pp. 5-40, Fall 1989.
[6] J. Chapin, S.A. Herrod, M. Rosenblum, and A. Gupta, “Memory System Performance of UNIX on CC-NUMA Multiprocessors,” Proc. ACM Sigmetrics Conf. Measurement and Modeling of Computer Systems, pp. 1-13, May 1995.
[7] J.B. Chen and B.N. Bershad, “The Impact of Operating System Structure on Memory System Performance,” Proc. 14th ACM Symp. Operating System Principles, pp. 120-133, Dec. 1993.
[8] R. Daigle, C. Xia, and J. Torrellas, “Low Perturbation Address Trace Collection for Operating System, Multiprogrammed, and Parallel Workloads in Multiprocessors,” technical report, Center for Supercomputing Research and Development, Univ. of Illinois at Urbana-Champaign, Mar. 1996.
[9] W.W. Hwu and P.P. Chang, “Achieving High Instruction Cache Performance with an Optimizing Compiler,” Proc. 16th Ann. Int'l Symp. Computer Architecture, pp. 242-251, June 1989.
[10] N.P. Jouppi, “Improving Direct-Mapped Cache Performance by the Addition of a Small Fully Associative Cache and Prefetch Buffers,” Proc. 17th Int'l Symp. Computer Architecture, pp. 364-373, May 1990.
[11] A. Maynard, C. Donnelly, and B. Olszewski, “Contrasting Characteristics and Cache Performance of Technical and Multi-User Commercial Workloads,” Proc. Sixth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 145-156, Oct. 1994.
[12] J. Ousterhout, “Why Aren't Operating Systems Getting Faster as Fast as Hardware,” Proc. Summer 1990 USENIX Conf., pp. 247-256, June 1990.
[13] V. Pai, P. Ranganathan, and S. Adve, “The Impact of Instruction-Level Parallelism on Multiprocessor Performance and Simulation Methodology,” Proc. Third Int'l Symp. High-Performance Computer Architecture, pp. 72-83, Feb. 1997.
[14] K. Pettis and R.C. Hansen, “Profile Guided Code Positioning,” Proc. SIGPLAN 1990 Conf. Programming Language Design and Implementation, pp. 16-27, June 1990.
[15] P. Ranganathan, V. Pai, H. Abdel-Shafi, and S. Adve, “The Interaction of Software Prefetching with ILP Processors in Shared-Memory Systems,” Proc. 24th Ann. Int'l Symp. Computer Architecture, pp. 144-156, June 1997.
[16] M. Rosenblum, E. Bugnion, S.A. Herrod, E. Witchel, and A. Gupta, “The Impact of Architectural Trends on Operating System Performance,” Proc. 15th ACM Symp. Operating System Principles, Dec. 1995.
[17] J. Torrellas, A. Gupta, and J. Hennessy, “Characterizing the Caching and Synchronization Performance of a Multiprocessor Operating System,” Proc. Fifth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 162-174, Oct. 1992.
[18] J. Torrellas, C. Xia, and R. Daigle, “Optimizing Instruction Cache Performance for Operating System Intensive Workloads,” Proc. First Int'l Symp. High-Performance Computer Architecture, pp. 360-369, Jan. 1995.
[19] R. Uhlig, D. Nagle, T. Mudge, S. Sechrest, and J. Emer, “Instruction Fetching: Coping with Code Bloat,” Proc. 22nd Ann. Int'l Symp. Computer Architecture, pp. 345-356, June 1995.
[20] C. Xia and J. Torrellas, “Improving the Data Cache Performance of Multiprocessor Operating Systems,” Proc. Second Int'l Symp. High-Performance Computer Architecture, pp. 85-94, Feb. 1996.
[21] C. Xia and J. Torrellas, “Instruction Prefetching of Systems Codes with Layout Optimized for Reduced Cache Misses,” Proc. 23rd Ann. Int'l Symp. Computer Architecture, pp. 271-282, May 1996.

Index Terms:
Cache hierarchies, shared-memory multiprocessors, architectural support for operating system, prefetching, trace-driven simulations, performance, block operations.
Citation:
Chun Xia, Josep Torrellas, "Comprehensive Hardware and Software Support for Operating Systems to Exploit MP Memory Hierarchies," IEEE Transactions on Computers, vol. 48, no. 5, pp. 494-505, May 1999, doi:10.1109/12.769432
Usage of this product signifies your acceptance of the Terms of Use.