The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.01 - January (2008 vol.57)
pp: 41-54
ABSTRACT
Microprocessor design is both complex and time-consuming: exploring a huge design space for identifying the optimal design under a number of constraints is infeasible using detailed architectural simulation of entire benchmark executions. Statistical simulation is a recently introduced approach for efficiently culling the microprocessor design space. The basic idea of statistical simulation is to collect a number of important program characteristics and to generate a synthetic trace from it. Simulating this synthetic trace is extremely fast as it contains a million instructions only. This paper improves the statistical simulation methodology by proposing accurate memory data flow models. We propose (i) cache miss correlation, or measuring cache statistics conditionally dependent on the global cache hit/miss history, for modeling cache miss patterns and memory-level parallelism, (ii) cache line reuse distributions for modeling accesses to outstanding cache lines, and (iii) through-memory read-after-write dependency distributions for modeling load forwarding and bypassing. Our experiments using the SPEC CPU2000 benchmarks show substantial improvements compared to current state-of-the-art statistical simulation methods. For example, for our baseline configuration, we reduce the average IPC prediction error from 10.9% to 2.1%; the maximum error observed equals 5.8%.
INDEX TERMS
Modeling techniques, Performance Analysis and Design Aids, Simulation
CITATION
Davy Genbrugge, Lieven Eeckhout, "Memory Data Flow Modeling in Statistical Simulation for the Efficient Exploration of Microprocessor Design Spaces", IEEE Transactions on Computers, vol.57, no. 1, pp. 41-54, January 2008, doi:10.1109/TC.2007.70783
REFERENCES
[1] R. Bell Jr. and L.K. John, “Improved Automatic Testcase Synthesis for Performance Model Validation,” Proc. 19th ACM Int'l Conf. Supercomputing (ICS '05), pp. 111-120, June 2005.
[2] D. Brooks, P. Bose, S.E. Schuster, H. Jacobson, P.N. Kudva, A. Buyuktosunoglu, J.-D. Wellman, V. Zyuban, M. Gupta, and P.W. Cook, “Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors,” IEEE Micro, vol. 20, no. 6, pp. 26-44, Nov./Dec. 2000.
[3] D. Brooks, V. Tiwari, and M. Martonosi, “Wattch: A Framework for Architectural-Level Power Analysis and Optimizations,” Proc. 27th Ann. Int'l Symp. Computer Architecture (ISCA '00), pp. 83-94, June 2000.
[4] R. Carl and J.E. Smith, “Modeling Superscalar Processors via Statistical Simulation,” Proc. Workshop Performance Analysis and Its Impact on Design (PAID '98), June 1998.
[5] T.M. Conte, M.A. Hirsch, and K.N. Menezes, “Reducing State Loss for Effective Trace Sampling of Superscalar Processors,” Proc. Int'l Conf. Computer Design (ICCD '96), pp. 468-477, Oct. 1996.
[6] L. Eeckhout, R.H. Bell Jr., B. Stougie, K. De Bosschere, and L.K. John, “Control Flow Modeling in Statistical Simulation for Accurate and Efficient Processor Design Studies,” Proc. 31st Ann. Int'l Symp. Computer Architecture (ISCA '04), pp. 350-361, June 2004.
[7] L. Eeckhout and K. De Bosschere, “Hybrid Analytical-Statistical Modeling for Efficiently Exploring Architecture and Workload Design Spaces,” Proc. 10th Int'l Conf. Parallel Architectures and Compilation Techniques (PACT '01), pp. 25-34, Sept. 2001.
[8] L. Eeckhout, S. Nussbaum, J.E. Smith, and K. De Bosschere, “Statistical Simulation: Adding Efficiency to the Computer Designer's Toolbox,” IEEE Micro, vol. 23, no. 5, pp. 26-38, Sept./Oct. 2003.
[9] S. Eyerman, L. Eeckhout, and K. De Bosschere, “Efficient Design Space Exploration of High Performance Embedded Out-of-Order Processors,” Proc. Conf. Design, Automation and Test in Europe (DATE '06), pp. 351-356, Mar. 2006.
[10] K.I. Farkas and N.P. Jouppi, “Complexity/Performance Tradeoffs with Non-Blocking Loads,” Proc. 21st Ann. Int'l Symp. Computer Architecture (ISCA '94), pp. 211-222, Apr. 1994.
[11] D. Genbrugge, L. Eeckhout, and K. De Bosschere, “Accurate Memory Data Flow Modeling in Statistical Simulation,” Proc. 20th ACM Int'l Conf. Supercomputing (ICS '06), pp. 87-96, June 2006.
[12] C. Hsieh and M. Pedram, “Micro-Processor Power Estimation Using Profile-Driven Program Synthesis,” IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, vol. 17, no. 11, pp.1080-1089, Nov. 1998.
[13] V.S. Iyengar and L.H. Trevillyan, “Evaluation and Generation of Reduced Traces for Benchmarks,” Technical Report RC 20610, IBM Research Division, T.J. Watson Research Center, Oct. 1996.
[14] V.S. Iyengar, L.H. Trevillyan, and P. Bose, “Representative Traces for Processor Models with Infinite Cache,” Proc. Second Int'l Symp. High-Performance Computer Architecture (HPCA '96), pp. 62-73, Feb. 1996.
[15] M. Johnson, Superscalar Microprocessor Design. Prentice Hall, 1991.
[16] T. Karkhanis and J.E. Smith, “A Day in the Life of a Data Cache Miss,” Proc. Second Ann. Workshop Memory Performance Issues (WMPI '02), May 2002.
[17] T.S. Karkhanis and J.E. Smith, “A First-Order Superscalar Processor Model,” Proc. 31st Ann. Int'l Symp. Computer Architecture (ISCA '04), pp. 338-349, June 2004.
[18] A.J. KleinOsowski and D.J. Lilja, “MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research,” IEEE Computer Architecture Letters, vol. 1, no. 2, pp.10-13, June 2002.
[19] D. Kroft, “Lockup-Free Instruction Fetch/Prefetch Cache Organization,” Proc. Eighth Ann. Int'l Symp. Computer Architecture (ISCA '81), pp. 81-87, May 1981.
[20] S.S. Mukherjee, S.V. Adve, T. Austin, J. Emer, and P.S. Magnusson, “Performance Simulation Tools: Guest Editors' Introduction,” Computer, special issue on high-performance simulators, vol. 35, no. 2, pp. 38-39, Feb. 2002.
[21] D.B. Noonburg and J.P. Shen, “A Framework for Statistical Modeling of Superscalar Processor Performance,” Proc. Third Int'l Symp. High-Performance Computer Architecture (HPCA '97), pp. 298-309, Feb. 1997.
[22] S. Nussbaum and J.E. Smith, “Modeling Superscalar Processors via Statistical Simulation,” Proc. 10th Int'l Conf. Parallel Architectures and Compilation Techniques (PACT '01), pp. 15-24, Sept. 2001.
[23] M. Oskin, F.T. Chong, and M. Farrens, “HLS: Combining Statistical and Symbolic Simulation to Guide Microprocessor Design,” Proc. 27th Ann. Int'l Symp. Computer Architecture (ISCA '00), pp. 71-82, June 2000.
[24] E. Perelman, G. Hamerly, and B. Calder, “Picking Statistically Valid and Early Simulation Points,” Proc. 12th Int'l Conf. Parallel Architectures and Compilation Techniques (PACT '03), pp. 244-256, Sept. 2003.
[25] T. Sherwood, E. Perelman, G. Hamerly, and B. Calder, “Automatically Characterizing Large Scale Program Behavior,” Proc. 10th Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS '02), pp. 45-57, Oct. 2002.
[26] R.A. Sugumar and S.G. Abraham, “Efficient Simulation of Caches under Optimal Replacement with Applications to Miss Characterization,” Proc. ACM Conf. Measurement and Modeling of Computer Systems (SIGMETRICS '93), pp. 24-35, 1993.
[27] R.E. Wunderlich, T.F. Wenisch, B. Falsafi, and J.C. Hoe, “SMARTS: Accelerating Microarchitecture Simulation via Rigorous Statistical Sampling,” Proc. 30th Ann. Int'l Symp. Computer Architecture (ISCA '03), pp. 84-95, June 2003.
15 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool