This Article 
 Bibliographic References 
 Add to: 
Measuring Cache and TLB Performance and Their Effect on Benchmark Runtimes
October 1995 (vol. 44 no. 10)
pp. 1223-1235

Abstract—In previous research, we have developed and presented a model for measuring machines and analyzing programs, and for accurately predicting the running time of any analyzed program on any measured machine. That work is extended here by: 1) developing a high level program to measure the design and performance of the cache and TLB units; 2) using those measurements, along with published miss ratio data, to improve the accuracy of our runtime predictions; 3) using our analysis tools and measurements to study and compare the design of several machines, with particular reference to their cache and TLB performance. As part of this work, we describe the design and performance of the cache and TLB for ten machines. The work presented in this paper extends a powerful technique for the evaluation and analysis of both computer systems and their workloads; this methodology is valuable both to computer users and computer system designers.

[1] A. Borg,R.E. Kesslet,, and D.W. Wall,“Generation and analysis of very long address traces,” Proc. 17th Int’l Symp. Computer Architecture, pp. 270-279, Seattle, May 1990.
[2] Cypress Semiconductors, SPARC Reference Manual, Cypress Semiconductors, 1990.
[3] T.C. Furlong,M.J.K. Nielsen,, and N.C. Wilhelm,“Development of the DECstation 3100,” Digital Technical J., vol. 2, no. 2, pp. 84-88, spring 1990.
[4] J. Gee and A.J. Smith,“TLB performance of the SPEC benchmark suite,” paper in preparation, draft of Jan. 1992.
[5] J.D. Gee, M.D. Hill, D.N. Pnevmatikatos, and A.J. Smith, "Cache Performance of the SPEC92 Benchmark Suite," IEEE Micro, pp. 17-27, Aug. 1993.
[6] J.R. Goodman,J. Hsieh,K. Kiou,A.R. Pleszkun,P.B. Scheuchter,, and H.C. Young,“PIPE: A VLSI decoupled architecture,” Proc. 12th Int’l Symp. Computer Architecture, pp. 20-27,Boston, June 1985.
[7] K. O’Brien,B. Hay,J. Minisk,H. Schaffer,B. Schloss,A. Shepherd,, and M. Zaleski,“Advanced compiler technology for the RISC system/6000 architecture,” IBM RISC System/6000 Technology, SA23-2619, pp. 154-161, IBM Corp., 1990.
[8] B.L. Peuto and L.J. Shustek,“An instruction timing model of CPU performance,” Fourth Int’l Symp. Computer Architecture, Computer Architecture News, vol. 5, no. 7, pp. 165-178, Mar. 1977.
[9] D.N. Pnevmatikatos and M.D. Hill,“Cache performance on the integer SPEC benchmarks,” Computer Architecture News, vol. 18, no. 2, pp. 53-68, June 1990.
[10] C.G. Ponder,“An analytical look at linear performance models,” Technical Report UCRL-JC-106105, Lawrence Livermore Nat’l Laboratory, Sept. 1990.
[11] R.H. Saavedra-Barrera, A.J. Smith, and E. Miya, “Machine Characterization Based on an Abstract High-Level Language Machine,” IEEE Trans. Computers, vol. 38, no. 12, pp. 1,659-1,679, Dec. 1989.
[12] R.H. Saavedra-Barrera,“CPU performance evaluation and execution time prediction using narrowspectrum benchmarking,” PhD thesis, UC Berkeley, Tech. Report No. UCB/CSD 92/684, Feb. 1992.
[13] R.H. Saavedra and A.J. Smith,“Analysis of benchmark characteristics and benchmark performance prediction,” submitted for publication, USC Tech. Report No. USC-CS-92-524, Oct. 1992.
[14] R.H. Saavedra and A.J. Smith,“Performance characterization of optimizing compilers,” IEEE Trans. Softwrae Engineering, vol. 21, no. 7, pp. 615-628, July 1995
[15] A.J. Smith, "Cache Memories," ACM Computing Surveys, Vol. 14, 1982, pp. 473-540.
[16] J.E. Smith,“Decoupled access/execute architectures,” ACM Trans. Computer Systems, vol. 2, no. 4, pp. 289-308, Nov. 1984.
[17] A.J. Smith, “Line (Block) Size Choice for CPU Cache Memories,” IEEE Trans. Computers, vol. 36, no. 9, pp. 1063-1075, Sept. 1987.
[18] SPEC, SPEC Newsletter: Benchmark Results, vol. 2, issue 2, spring 1990.
[19] SPEC, SPEC Newsletter: Benchmark Results, vol. 2, issue 3, summer 1990.
[20] SPEC, SPEC Newsletter: Benchmark Results, vol. 3, issue 1, winter 1991.
[21] E. Spertus,S.C. Goldstein,K.E. Schauser,T. von Eicken,D.E. Culler,, and W.J. Dally,“Evaluation of mechanisms for fine-grained parallel programs in theJ-Machine and the CM-5,” Proc. 20th Int’l Symp. Computer Architecture, pp. 302-313,San Diego, May16-19 1993.
[22] W.A. Wolf,“The WM computer architecture,” Computer Architecture News, vol. 16, no. 1, pp. 70-84, Mar. 1988.

Index Terms:
Performance evaluation, execution time prediction, memory hierarchy, processor caches, table lookaside buffers.
Alan Jay Smith, Rafael H. Saavedra, "Measuring Cache and TLB Performance and Their Effect on Benchmark Runtimes," IEEE Transactions on Computers, vol. 44, no. 10, pp. 1223-1235, Oct. 1995, doi:10.1109/12.467697
Usage of this product signifies your acceptance of the Terms of Use.