This Article 
 Bibliographic References 
 Add to: 
Probabilistic Miss Equations: Evaluating Memory Hierarchy Performance
March 2003 (vol. 52 no. 3)
pp. 321-336

Abstract—The increasing gap between processor and main memory speeds makes the role of the memory hierarchy behavior in the system performance essential. Both hardware and software techniques to improve this behavior require good analysis tools that help predict and understand such behavior. Analytical modeling arises as a good choice in this field due to its high speed if its traditional limited precision is overcome. We present a modular analytical modeling strategy for arbitrary set-associative caches with LRU replacement policy. The model differs from all the previous related works in its probabilistic approach. Both perfectly and nonperfectly nested loops as well as reuse between different nests are considered by this model, so it makes the analysis of complete programs with regular computations feasible. Moreover, the model achieves good levels of accuracy while being extremely fast and flexible enough to allow its extension. Our approach has been extensively validated using well-known benchmarks. Finally, the model has also proven its ability to drive code optimizations even more successfully than current production compilers.

[1] R.A. Uhlig and T.N. Mudge, "Trace-Driven Memory Simulation: A Survey," ACM Computing Surveys, Vol. 29, No. 2, June 1997, pp. 128-170.
[2] M. Zagha, B. Larson, S. Turner, and M. Itzkowitz, "Performance Analysis Using the MIPS R10000 Performance Counters," Proc. Supercomputing '96,Pittsburgh, Pa., Nov. 1996.
[3] J. Dean et al., "ProfileMe: Hardware Support for Instruction-Level Profiling on Out-of-Order Processors," Proc. 30th Symp. Microarchitecture (Micro-30), IEEE CS Press, Los Alamitos, Calif., 1997, pp. 292-302.
[4] M. Lam, E. Rothberg, and M. Wolf, “The Cache Performance and Optimizations of Blocked Algorithms,” Proc. Fourth Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS '91), 1991.
[5] J.J. Navarro, T. Juan, and T. Lang, “MOB Forms: A Class of Multilevel Block Algorithms for Dense Linear Algebra Operations,” Proc. Supercomputing '94 pp. 354-363, 1994.
[6] O. Temam, C. Fricker, and W. Jalby, "Cache Interference Phenomena," Proc. ACM SIGMetrics Conf. Measurement and Modeling of Computer Systems, ACM Press, New York, 1994.
[7] K.S. McKinley and O. Temam, “A Quantitative Analysis of Loop Nest Locality,” Proc. Seventh Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 94-104, Oct. 1996.
[8] W. Blume, R. Doallo, R. Eigenmann, J. Grout, J. Hoeflinger, T. Lawrence, J. Lee, D. Padua, Y. Paek, B. Pottenger, L. Rauchwerger, and P. Tu,“Parallel Programming with Polaris,” Computer, vol. 29, no. 12, pp. 78-82, Dec. 1996.
[9] D. Gannon, W. Jalby, and K. Gallivan, "Strategies for Cache and Local Memory Management by Global Program Transformations," J. Parallel and Distributed Computing, vol. 5, no. 5, pp. 587-616, Oct. 1988.
[10] S. Carr, K.S. McKinley, and C.-W. Tseng, “Compiler Optimizations for Improving Data Locality,” Proc. Sixth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 252-262, Oct. 1994.
[11] B.B. Fraguela, R. Doallo, and E.L. Zapata, “Modeling Set Associative Caches Behaviour for Irregular Computations,” ACM Int'l Conf. Measurement and Modeling of Computer Systems (SIGMETRICS '98), pp. 192-201, June 1998.
[12] G. Cascaval, “Compile-Time Performance Prediction of Scientific Programs,” PhD thesis, Dept. of Computer Science, Univ. of Illinois at Urbana-Champaign, 2000.
[13] M. Wolf and M. Lam, “A Data Locality Optimizing Algorithm,” Proc. SIGPLAN Conf. Programming Language Design and Implementation, pp. 30-44, June 1991.
[14] B.B. Fraguela, R. Doallo, and E.L. Zapata, “Automatic Analytical Modeling for the Estimation of Cache Misses,” Proc. Int'l Conf. Parallel Architectures and Compilation Techniques (PACT '99), Oct. 1999.
[15] Y. Paek, J. Hoeflinger, D. Padua, “Simplification of Array Access Patterns for Compiler Optimizations,” Proc. ACM SIGPLAN‘98, PLDI, pp. 60–71, June 1998.
[16] A.R. Lebeck and D.A. Wood, "Cache Profiling and the SPEC Benchmarks: A Case Study," Computer, Oct. 1994, pp. 15-26.
[17] R. Saavedra and A. Smith, “Measuring Cache and TLB Performance and Their Effect on Benchmark Run Times,” IEEE Trans. Computers, vol. 44, no. 10, pp. 1223-1235, Oct. 1995.
[18] G. Rivera and C.-W. Tseng, “A Comparison of Compiler Tiling Algorithms,” Proc. Eighth Int'l Conf. Compiler Construction, pp. 168-182, 1999.
[19] V.E. Taylor, “Sparse Matrix Computations: Implications for Cache Designs,” Proc. Supercomputing '92, pp. 598-607, 1992.
[20] A. Agarwal, “Analysis of Cache Performance for Operating Systems and Multiprogramming,” PhD thesis, Dept. of Electrical Eng., Stanford Univ., 1987.
[21] R.W. Quong, “Expected I-Cache Miss Rates via the Gap Model,” Proc. 21st Ann. Int'l Symp. Computer Architecture, pp. 372-383, Apr. 1994.
[22] R. Netzer and B. Miller, "Optimal Tracing and Replay for Debugging Message-Passing Parallel Programs, Proc. Supercomputing '92, pp. 502-511, 1992.
[23] B.L. Jacob, P.M. Chen, S.R. Silverman, and T.N. Mudge, “An Analytical Model for Designing Memory Hierarchies,” IEEE Trans. Computers, vol. 45, no. 10, pp. 1180-1194, Oct. 1996.
[24] A.K. Porterfield, “Software Methods for Improvement of Cache Performance on Supercomputer Applications,” doctoral thesis, Dept. of Computer Science, Rice Univ., Apr. 1989.
[25] J. Ferrante, V. Sarkar, and W. Thrash, “On Estimating and Enhancing Cache Effectiveness,” Proc. Fourth Int'l Workshop Languages and Compilers for Parallel Computing, pp. 328-343, Aug. 1991.
[26] S. Ghosh, M. Martonosi, and S. Malik, “Cache Miss Equations: A Compiler Framework for Analyzing and Tuning Memory Behavior,” ACM Trans. Programming Languages and Systems, vol. 21, pp. 702-745, July 1999.
[27] J.S. Harper, D.J. Kerbyson, and G.R. Nudd, “Analytical Modeling of Set-Associative Cache Behavior,” IEEE Trans. Computers, vol. 48, no. 10, pp. 1009-1024, Oct. 1999.
[28] S. Chatterjee, E. Parker, P. Hanlon, and A. Lebeck, “Exact Analysis of the Cache Behavior of Nested Loops,” Proc. ACM SIGPLAN '01 Conf. Programming Language Design and Implementation (PLDI '01), pp. 286-297, June 2001.
[29] X. Vera and J. Xue, “Let's Study Whole-Program Behaviour Analytically,” Proc. Eighth Int'l Symp. High-Performance Computer Architecture (HPCA8), pp. 175-186, Feb. 2002.
[30] J. Sanchez and A. Gonzalez, “Analyzing Data Locality in Numeric Applications,” IEEE Micro, vol. 20, no. 4, pp. 58-66, July/Aug. 2000.

Index Terms:
Analytical modeling, probabilistic miss estimation, memory hierarchy, performance prediction, compiler optimizations.
Basilio B. Fraguela, Ramón Doallo, Emilio L. Zapata, "Probabilistic Miss Equations: Evaluating Memory Hierarchy Performance," IEEE Transactions on Computers, vol. 52, no. 3, pp. 321-336, March 2003, doi:10.1109/TC.2003.1183947
Usage of this product signifies your acceptance of the Terms of Use.