This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
An Analysis of the Performance Impact of Wrong-Path Memory References on Out-of-Order and Runahead Execution Processors
December 2005 (vol. 54 no. 12)
pp. 1556-1571
High-performance, out-of-order execution processors spend a significant portion of their execution time on the incorrect program path even though they employ aggressive branch prediction algorithms. Although memory references generated on the wrong path do not change the architectural state of the processor, they affect the arrangement of data in the memory hierarchy. This paper examines the effects of wrong-path memory references on processor performance. It is shown that these references significantly affect the IPC (Instructions Per Cycle) performance of a processor. Not modeling them leads to errors of up to 10 percent (4 percent on average) in IPC estimates for the SPEC CPU2000 integer benchmarks on an out-of-order processor and errors of up to 63 percent on a runahead-execution processor. In general, the error in the IPC increases with increasing memory latency and instruction window size. We find that wrong-path references are usually beneficial for performance because they prefetch data that will be used by later correct-path references. L2 cache pollution is found to be the most significant negative effect of wrong-path references. Code examples are shown to provide insights into how wrong-path references affect performance. We also show that it is crucial to model wrong-path references to accurately estimate the performance improvement provided by runahead execution.

[1] R.I. Bahar and G. Albera, “Performance Analysis of Wrong-Path Data Cache Accesses,” Proc. Workshop Performance Analysis and Its Impact on Design, June 1998.
[2] R. Bhargava, L.K. John, and F. Matus, “Accurately Modeling Speculative Instruction Fetching in Trace-Driven Simulation,” Proc. Int'l Performance, Computing, and Comm. Conf., pp. 65-71, 1999.
[3] M.G. Butler, “Aggressive Execution Engines for Surpassing Single Basic Block Execution,” PhD thesis, Univ. of Michigan, 1993.
[4] P.-Y. Chang, E. Hao, and Y.N. Patt, “Predicting Indirect Jumps Using a Target Cache,” Proc. 24th Ann. Int'l Symp. Computer Architecture, pp. 274-283, 1997.
[5] Y. Chou, B. Fahs, and S. Abraham, “Microarchitecture Optimizations for Exploiting Memory-Level Parallelism,” Proc. 31st Ann. Int'l Symp. Computer Architecture, pp. 76-87, 2004.
[6] J. Combs, C.B. Combs, and J.P. Shen, “Mispredicted Path Cache Effects,” Proc. Fifth Int'l Euro-Par Conf. Parallel Processing, pp. 1322-1331, 1999.
[7] J. Dundas and T. Mudge, “Improving Data Cache Performance by Pre-Executing Instructions under a Cache Miss,” Proc. 1997 Int'l Conf. Supercomputing, pp. 68-75, 1997.
[8] S. Iacobovici, L. Spracklen, S. Kadambi, Y. Chou, and S.G. Abraham, “Effective Stream-Based and Execution-Based Data Prefetching,” Proc. 18th Int'l Conf. Supercomputing, pp. 1-11, 2004.
[9] N.P. Jouppi, “Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers,” Proc. 17th Ann. Int'l Symp. Computer Architecture, pp. 364-373, 1990.
[10] N.P. Jouppi and S.J. E. Wilton, “Tradeoffs in Two-Level On-Chip Caching,” Proc. 21st Ann. Int'l Symp. Computer Architecture, pp. 34-45, 1994.
[11] S. Jourdan, T.-H. Hsing, J. Stark, and Y.N. Patt, “The Effects of Mispredicted-Path Execution on Branch Prediction Structures,” Proc. 1996 Int'l Conf. Parallel Architectures and Compilation Techniques, pp. 58-67, 1996.
[12] D. Lee, J.-L. Baer, B. Calder, and D. Grunwald, “Instruction Cache Fetch Policies for Speculative Execution,” Proc. 22nd Ann. Int'l Symp. Computer Architecture, pp. 357-367, 1995.
[13] S. Manne, A. Klauser, and D. Grunwald, “Pipeline Gating: Speculation Control for Energy Reduction,” Proc. 25th Ann. Int'l Symp. Computer Architecture, pp. 132-141, 1998.
[14] S. McFarling, “Combining Branch Predictors,” Technical Report TN-36, Digital Western Research Laboratory, June 1993.
[15] M. Moudgill, J.-D. Wellman, and J.H. Moreno, “An Approach for Quantifying the Impact of Not Simulating Mispredicted Paths,” Proc. Workshop Performance Analysis and Its Impact on Design, June 1998.
[16] O. Mutlu, H. Kim, D.N. Armstrong, and Y.N. Patt, “Cache Filtering Techniques to Reduce the Negative Impact of Useless Speculative Memory References on Processor Performance,” Proc. 16th Symp. Computer Architecture and High Performance Computing, pp. 2-9, 2004.
[17] O. Mutlu, H. Kim, D.N. Armstrong, and Y.N. Patt, “Understanding the Effects of Wrong-Path Memory References on Processor Performance,” Proc. Third Workshop Memory Performance Issues, pp. 56-64, June 2004.
[18] O. Mutlu, J. Stark, C. Wilkerson, and Y.N. Patt, “Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors,” Proc. Ninth Int'l Symp. High Performance Computer Architecture, pp. 129-140, 2003.
[19] J. Pierce and T. Mudge, “The Effect of Speculative Execution on Cache Performance,” Proc. Eighth Int'l Parallel Processing Symp., pp. 172-179, 1994.
[20] J. Pierce and T. Mudge, “Wrong-Path Instruction Prefetching,” Proc. 29th Ann. Int'l Symp. Microarchitecture, pp. 165-175, 1996.
[21] E. Rotenberg, Q. Jacobson, and J.E. Smith, “A Study of Control Independence in Superscalar Processors,” Proc. Fifth Int'l Symp. High Performance Computer Architecture, pp. 115-124, 1999.
[22] R. Sendag, D.J. Lilja, and S.R. Kunkel, “Exploiting the Prefetching Effect Provided by Executing Mispredicted Load Instructions,” Proc. Eighth Int'l Euro-Par Conf. Parallel Processing, pp. 468-480, 2002.
[23] J. Tendler, S. Dodson, S. Fields, H. Le, and B. Sinharoy, “POWER4 System Microarchitecture,” IBM Technical White Paper, Oct. 2001.
[24] M.V. Wilkes, “The Memory Gap and the Future of High Performance Memories,” ACM Computer Architecture News, vol. 29, no. 1, pp. 2-7, Mar. 2001.
[25] T.-Y. Yeh and Y.N. Patt, “Alternative Implementations of Two-Level Adaptive Branch Prediction,” Proc. 19th Ann. Int'l Symp. Computer Architecture, pp. 124-134, 1992.

Index Terms:
Index Terms- Single data stream architectures, speculative execution, runahead execution, processor performance modeling.
Citation:
Onur Mutlu, Hyesoon Kim, David N. Armstrong, Yale N. Patt, "An Analysis of the Performance Impact of Wrong-Path Memory References on Out-of-Order and Runahead Execution Processors," IEEE Transactions on Computers, vol. 54, no. 12, pp. 1556-1571, Dec. 2005, doi:10.1109/TC.2005.190
Usage of this product signifies your acceptance of the Terms of Use.