This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Optimizing the Instruction Cache Performance of the Operating System
December 1998 (vol. 47 no. 12)
pp. 1363-1381

Abstract—High instruction cache hit rates are key to high performance. One known technique to improve the hit rate of caches is to minimize cache interference by improving the layout of the basic blocks of the code. However, the performance impact of this technique has been reported for application code only, even though there is evidence that the operating system often uses the cache heavily and with less uniform patterns than applications. It is unknown how well existing optimizations perform for systems code and whether better optimizations can be found. We address this problem in this paper. This paper characterizes, in detail, the locality patterns of the operating system code and shows that there is substantial locality. Unfortunately, caches are not able to extract much of it: Rarely-executed special-case code disrupts spatial locality, loops with few iterations that call routines make loop locality hard to exploit, and plenty of loop-less code hampers temporal locality. Based on our observations, we propose an algorithm to expose these localities and reduce interference in the cache. For a range of cache sizes, associativities, lines sizes, and organizations, we show that we reduce total instruction miss rates by 31-86 percent, or up to 2.9 absolute points. Using a simple model, this corresponds to execution time reductions of the order of 10-25 percent. In addition, our optimized operating system combines well with optimized and unoptimized applications.

[1] A. Agarwal, P. Chow, M. Horowitz, J. Acken, A. Salz, and J. Hennessy, "On-Chip Caches for High-Performance Processors," Advanced Research in VLSI: Proc. 1987 Stanford Conf., pp. 1-24, Mar. 1987.
[2] A. Agarwal, J. Hennessy, and M. Horwitz, “Cache Performance of Operating System and Multiprogramming Workloads,” ACM Trans. Computer Systems, vol. 6, no. 4, pp. 393-431, Nov. 1988.
[3] A.V. Aho, R. Sethi, and J.D. Ullman, Compilers, Principles, Techniques and Tools.New York: Addison-Wesley, 1985.
[4] T. Anderson, H. Levy, B. Bershad, and E. Lazowska, “The Interaction of Architecture and Operating System Design,” Proc. Fourth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 108-120, Apr. 1991.
[5] J.B. Andrews, "A Hardware Tracing Facility for a Multiprocessing Supercomputer," Technical Report 1009, Center for Supercomputing Research and Development, Univ. of Illinois at Urbana-Champaign, May 1990.
[6] M. Berry et al. "The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers," Int'l J. Supercomputer Applications, vol. 3, no. 3, pp. 5-40, Fall 1989.
[7] P.P. Chang and W.W. Hwu, "Trace Selection for Compiling Large C Application Programs to Microcode," Proc. 21st Ann. Workshop Microprogramming and Microarchitectures, pp. 21-29, Nov. 1988.
[8] J.B. Chen and B.N. Bershad, “The Impact of Operating System Structure on Memory System Performance,” Proc. 14th ACM Symp. Operating System Principles, pp. 120-133, Dec. 1993.
[9] W.Y. Chen et al., "The Effect of Code Expanding Optimizations on Instruction Cache Design," IEEE Trans. Computers, vol. 42, no. 9, pp. 1,045-1,057, Sept. 1993.
[10] D. Cheriton, A. Gupta, P. Boyle, and H. Goosen, "The VMP Multiprocessor: Initial Experience, Refinements and Performance Evaluation," Proc. 15th Ann. Int'l Symp. Computer Architecture, pp. 410-421, May 1988.
[11] D. Clark, "Cache Performance in the VAX-11/780," ACM Trans. Computer Systems, vol. 1, no. 1, pp. 24-37, Feb. 1983.
[12] R. Gupta and C.-H. Chi, "Improving Instruction Cache Behavior by Reducing Cache Pollution," Proc. Supercomputing 1990, pp. 82-91, Nov. 1990.
[13] R.R. Heisch, "Trace-Directed Program Restructuring for AIX Executables," IBM J. Research and Development, pp. 595-603, Sept. 1994.
[14] J. Hoeflinger, "Cedar Fortran Programmer's Handbook," Technical Report 1157, Center for Supercomputing Research and Development, Oct. 1991.
[15] W.W. Hwu and P.P. Chang, “Achieving High Instruction Cache Performance with an Optimizing Compiler,” Proc. 16th Ann. Int'l Symp. Computer Architecture, pp. 242-251, June 1989.
[16] S. McFarling, "Program Optimization for Instruction Caches," Proc. Third Int'l Conf. Architectural Support for Programming Languages and Operating Systems, 1989.
[17] S. McFarling, "Procedure Merging with Instruction Caches," Proc. SIGPLAN 1991 Conf. Programming Language Design and Implementation, pp. 71-79, June 1991.
[18] A. Mendlson, S. Pinter, and R. Shtokhamer, "Compile Time Instruction Cache Optimizations," Computer Architecture News, pp. 44-51, Mar. 1994.
[19] D. Nagle, R. Uhlig, T.M. Mudge, and S. Sechrest, "Optimal Allocation of On-Chip Memory for Multiple-API Operating Systems," Proc. 21st Ann. Int'l Symp. Computer Architcture, pp. 358-369, Apr. 1994.
[20] J. Ousterhout, "Why Aren't Operating Systems Getting Faster as Fast as Hardware?" Proc. Summer 1990 USENIX Conf., pp. 247-256, June 1990.
[21] K. Pettis and R.C. Hansen, “Profile Guided Code Positioning,” Proc. SIGPLAN 1990 Conf. Programming Language Design and Implementation, pp. 16-27, June 1990.
[22] A.D. Samples and P.N. Hilfinger, "Code Reorganization for Instruction Caches," Technical Report CSD-88-447, Univ. of California, Berkeley, Oct. 1988.
[23] J. Torrellas, A. Gupta, and J. Hennessy, “Characterizing the Caching and Synchronization Performance of a Multiprocessor Operating System,” Proc. Fifth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 162-174, Oct. 1992.
[24] Y. Wu, "Ordering Functions for Improving Memory Reference Locality in a Shared Memory Multiprocessor System," Proc. 25th Ann. Int'l Symp. Microarchitecture, pp. 218-221, Dec. 1992.

Index Terms:
Cache miss rates, instruction caches, code layout optimization.
Citation:
Josep Torrellas, Chun Xia, Russell L. Daigle, "Optimizing the Instruction Cache Performance of the Operating System," IEEE Transactions on Computers, vol. 47, no. 12, pp. 1363-1381, Dec. 1998, doi:10.1109/12.737683
Usage of this product signifies your acceptance of the Terms of Use.