| | This Article | |
| |
| |
| | Share | |
| |
| |
| | Bibliographic References | |
| |
| |
| | Add to: | |
| |
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
| |
| | Search | |
| |
| |
| | |
A Simple Mechanism for Detecting Ineffectual Instructions in Slipstream Processors
April 2004 (vol. 53 no. 4)
pp. 399-413
A slipstream processor accelerates a program by speculatively removing repeatedly ineffectual instructions. Detecting the roots of ineffectual computation—unreferenced writes, nonmodifying writes, and correctly predicted branches—is straightforward. On the other hand, detecting ineffectual instructions in the backward slices of these root instructions currently requires complex back-propagation circuitry. We observe that, by logically monitoring the speculative program (instead of the original program), back-propagation can be reduced to detecting unreferenced writes. That is, once root instructions are actually removed, instructions at the next higher level in the backward slice become newly exposed unreferenced writes in the speculative program. This new algorithm, called implicit back-propagation, eliminates complex hardware and achieves an average performance improvement of 11.8 percent, only marginally lower than the 12.3 percent improvement achieved with explicit back-propagation. We further simplify the hardware component by electing not to detect ineffectual memory writes, focusing only on ineffectual register writes. A minimal implementation consisting of only a register-indexed table (similar to an architectural register file) achieves a good balance between complexity and performance (11.2 percent average performance improvement with implicit back-propagation and without detection of ineffectual memory writes).
[1] M.M. Annavaram, J.M. Patel, and E.S. Davidson, Data Prefetching by Dependence Graph Precomputation Proc. 28th Ann. Int'l Symp. Computer Architecture, pp. 52-61, 2001.
[2] D.C. Burger, T.M. Austin, and S. Bennett, The Simplescalar Tool Set, Version 2.0 Technical Report 1342, Computer Sciences Dept., Univ. of Wisconsin-Madison, 1997.
[3] J. Collins, D. Tullsen, H. Wang, and J.P. Shen, “Dynamic Speculative Precomputation,” Proc. 34th Int'l Symp. Microarchitecture, Dec. 2001.
[4] E. Jacobson, E. Rotenberg,, and J.E. Smith,"Assigning Confidence to Conditional Branch Predictions," Proc. 29th Int'l Symp. Microarchitecture, ACM Press, 1996, pp. 142-152.
[5] J. Kahle, Power4: A Dual-CPU Processor Chip Microprocessor Forum, Oct. 1999.
[6] K.M. Lepak and M.H. Lipasti, On the Value Locality of Store Instructions Proc. 27th Int'l Symp. Computer Architecture, June 2000.
[7] C.-K. Luk, Tolerating Memory Latency through Software-Controlled Pre-Execution in Simultaneous Multithreading Processors Proc. 28th Ann. Int'l Symp. Computer Architecture, pp. 40-51, 2001.
[8] S. McFarling, Combining Branch Predictors Technical Report TN-36, WRL, June 1993.
[9] A. Moshovos, D. Pnevmatikatos, and A. Baniasadi, Slice Processors: An Implementation of Operation-Based Prediction Proc. 15th Int'l Conf. Supercomputing, June 2001.
[10] K. Olukotun, B.A. Nayfeh, L. Hammond, K. Wilson, and K.-Y. Chang, The Case for a Single-Chip Multiprocessor Proc. Seventh Int'l Symp. Architectural Support for Programming Languages and Operating Systems, Oct. 1996.
[11] Z. Purser, K. Sundaramoorthy, and E. Rotenberg, "A Study of Slipsteam Processors," Proc. 33rd Int'l Symp. Microarchitecture, IEEE CS Press, 2000, pp. 269-280.
[12] Z. Purser, K. Sundaramoorthy, and E. Rotenberg, Slipstream Memory Hierarchies Technical Report CESR-TR-02-3, Center for Embedded Systems Research, North Carolina State Univ., Feb. 2002.
[13] E. Rotenberg, Exploiting Large Ineffectual Instruction Sequences technical report, North Carolina State Univ., Nov. 1999.
[14] A. Roth and G.S. Sohi, "Speculative Data-Driven Multithreading," Proc. 7th Int'l Symp. High-Performance Computer Architecture(HPCA-7), IEEE CS Press, Los Alamitos, Calif., 2001, pp. 37-48.
[15] A. Roth and G.S. Sohi, Speculative Data-Driven Multithreading Technical Report CS-TR-00-1414, Computer Sciences Dept., Univ. of Wisconsin-Madison, Feb. 2000.
[16] K. Sundaramoorthy, Z. Purser, and E. Rotenberg, Slipstream Processors: Improving Both Performance and Fault Tolerance Proc. Ninth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, Nov. 2000.
[17] C. Zilles and G. Sohi, “Execution-Based Prediction Using Speculative Slices,” Proc. 28th Ann. Int'l Symp. Computer Architecture, June 2001.
[18] C.B. Zilles and G.S. Sohi, Understanding the Backward Slices of Performance-Degrading Instructions Proc. 27th Int'l Symp. Computer Architecture, June 2000.
[19] J.J. Koppanalil, A Simple Mechanism for Detecting Ineffectual Instructions in Slipstream Processors MS thesis, North Carolina State Univ., May 2002.
[20] E. Rotenberg, Trace Processors: Exploiting Hierarchy and Speculation PhD thesis, Univ. of Wisconsin-Madison, 1999.
[21] K. Flautner, R. Uhlig, S. Reinhardt, and T. Mudge, Thread Level Parallelism of Desktop Applications Proc. Workshop Multi-threaded Execution, Architecture, and Compilation, Jan. 2000.
Index Terms:
Microarchitecture, multithreading, chip multiprocessor, slipstream, preexecution.
Citation:
Jinson J. Koppanalil, Eric Rotenberg, "A Simple Mechanism for Detecting Ineffectual Instructions in Slipstream Processors," IEEE Transactions on Computers, vol. 53, no. 4, pp. 399-413, Apr. 2004, doi:10.1109/TC.2004.1268397