This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
On Augmenting Trace Cache for High-Bandwidth Value Prediction
September 2002 (vol. 51 no. 9)
pp. 1074-1088

Abstract—Value prediction is a technique that breaks true data dependences by predicting the outcome of an instruction and speculatively executes its data-dependent instructions based on the predicted outcome. As the instruction fetch rate and issue rate of processors increase, the potential data dependences among instructions issued in the same cycle also increase. Value prediction and speculative execution become critical to keep the issue rate high. Unfortunately, most of the proposed value prediction schemes focused only on the accuracy of the prediction. They have yet to consider the bandwidth required to access the value prediction tables. In this paper, we focus on the bandwidth issues of the value prediction. We propose augmenting the trace cache [19], [26] (which was proposed to provide the required fetch bandwidth for wide-issue ILP processors) with a copy of the predicted values and moving the generation of those predicted values (which require accessing the value prediction tables) from the instruction fetch stage to a later stage, e.g., the writeback stage. Such a change will allow “selective value prediction,” i.e., only those instructions which require value prediction will access the value prediction tables. It can significantly reduce the bandwidth requirement of value prediction tables. We also use a dynamic classification scheme to steer predictor updates to behavior-specific tables (such as last-value, stride, two-level, etc.). A relatively even split among such table accesses further moderates the bandwidth requirement of those tables.

[1] D. Burger and T. Austin, “The Simplescalar Tool Set, Version 2.0,” Technical Report CS-TR-97-1342, Univ. of Wisconsin, Madison, June 1997.
[2] B. Calder, G. Reinman, and D. Tullsen, Selective Value Prediction Proc. 26th Int'l Symp. Computer Architecture, 1999.
[3] P.-Y. Chang, E. Hao, and Y.N. Patt, Alternative Implementations of Hybrid Branch Predictors Proc. 28th Ann. Int'l Symp. Microarchitecture, pp. 252-257, Dec. 1995.
[4] D.H. Friendly, S.J. Patel,, and Y.N. Patt, ``Alternative Fetch and Issue Techniques for the Trace Cache Fetch Mechanism,'' Proc. 30th Ann. ACM/IEEE Int'l Symp. Microarchitecture, 1997.
[5] D.H. Friendly, S.J. Patel,, and Y.N. Patt, ``Putting the Fill Unit to Work: Dynamic Optimizations for Trace Cache Microprocessors,'' Proc. 30th Ann. ACM/IEEE Int'l Symp. Microarchitecture, 1997.
[6] F. Gabbay and A. Mendelson, “Speculative Execution Based on Value Prediction,” TR 1080, Electrical Eng. Dept., Technion Israel Inst. of Technology, Nov. 1996.
[7] F. Gabbay and A. Mendelson, “The Effect of Instruction Fetch Bandwidth on Value Prediction,” Proc. 25th Int'l Symp. ComputerArchitecture (ISCA-25), pp. 272-281, 1998.
[8] A. Gonzalez, J. Tubella, and C. Molina, “Trace-Level Reuse,” Proc. Int'l Conf. Parallel Processing, Sept. 1999.
[9] J. González and A. González, “The Potential of Data Value Speculation to Boost ILP,” Proc. Int'l Conf. Supercomputing, 1998.
[10] J. Huang and D. Lilja, “Exploiting Basic Block Value Locality with Block Reuse,” Proc. Fifth Int'l Symp. High Performance Computer Architecture (HPCA-5), pp.106-114, Jan. 1999.
[11] Q. Jacobson, E. Rotenberg, and J.E. Smith, “Path-Based Next Trace Prediction,” Proc. 30th Ann. ACM/IEEE Int'l Symp. Microarchitecture, 1997.
[12] M. Johnson, Superscalar Microprocessor Design. Prentice Hall, 1991.
[13] A. Klein Osowski, J. Flynn, N. Meares, and D. Lilja, “Adapting the SPEC 2000 Benchmark Suite for Simulation-Based Computer Architecture Research,” Proc. Workshop Workload Characterization, Int'l Conf. Computer Design, Sept. 2000.
[14] S. Lee, Y. Wang, and P. Yew, “Decoupled Value Prediction on Trace Processors,” Proc. Sixth Int'l Symp. High Performance Computer Architecture (HPCA-6), 2000.
[15] S. Lee and P. Yew, “On Some Implementation Issues for Value Prediction on Wide-Issue ILP Processors,” Proc. Int'l Conf. Parallel Architectures and Compilation Techniques (PACT2000), Oct. 2000.
[16] M. Lipasti and J. Shen, “Exceeding the Limit via Value Prediction,” Proc. 29th Int'l Symp. Microarchitecture (MICRO-29). Dec. 1996.
[17] M.H. Lipasti, Value Locality and Speculative Execution, doctoral dissertation, Carnegie Mellon Univ., Dept. Electrical and Computer Eng., May 1997.
[18] S. McFarling, “Combining Branch Predictors,” Technical Report TN-36, Digital Equipment Corp., Western Research Lab, June 1993.
[19] S.J. Patel, D.H. Friendly, and Y.N. Patt, “Evaluation of Design Options for the Trace Cache Fetch Mechanism,” IEEE Trans. Computers, special issue on cache memory and related problems, vol. 48, no. 2, pp. 193-204, Feb. 1999.
[20] M. Postiff, G. Tyson, and T. Mudge, “Performance Limits of Trace Caches,” J. Instruction Level Parallelism, Oct. 1999.
[21] Q. Zhao, S. Lee, and D. Lilja, “Using Hyperprediction to Compensate for Delayed Updates in Value Predictors,” Technical Report No. ARCTiC 01-02, Laboratory for Advanced Research in Computing Technology and Compilers, Univ. of Minnesota, June 2001.
[22] R. Rakvic, B. Black, and P. Shen, “Completion Time Multiple Branch Prediction for Enhancing Trace Cache Performance,” Proc. 27th Ann. Int'l Symp. Computer Architecture (ISCA-27), June 2000.
[23] G. Reinman, T. Austin, and B. Calder, "A Scalable Front-End Architecture for Fast Instruction Delivery," Proc. 26th Ann. Int'l Symp. Computer Architecture, IEEE Press, Piscataway, N.J., 1999, pp. 234-245.
[24] E. Rotenberg, S. Bennett, and J. Smith, "Trace Cache: A Low Latency Approach to High Bandwidth Instruction Fetching," Proc. 29th Ann. ACM/IEEE Int'l Symp. on Microarchitecture, IEEE CS Press, Los Alamitos, Calif., 1996, pp. 24-34.
[25] E. Rotenberg, Q. Jacobson, Y. Sazeides, and J.E. Smith, Trace Processors Proc. 30th Int'l Symp. Microarchitecture, pp. 138-148, 1997.
[26] E. Rotenburg, S. Bennett, and J. Smith, “A Trace Cache Microarchitecture and Evaluation,” IEEE Trans. Computers, vol. 48, no. 2, Feb. 1999.
[27] B. Rychlik, J. Faistl, B. Krug, A. Kurland, J. Jung, M. Velev, and J. Shen, “Efficient and Accurate Value Prediction Using Dynamic Classification,” technical report, Microarchitecture Research Team, Dept. of Electrical and Computer Eng., Carnegie Mellon Univ., 1998.
[28] B. Rychlik, J. Faistl, B. Krug, and J. Shen, “Efficacy and Performance Impact of Value Prediction,” Parallel Architectures and Compilation Techniques, Oct. 1998.
[29] T. Sato, “Analyzing Overhead of Reissued Instructions on Data Speculative Processors,” Proc. Workshop Performance Analysis and Its Impaction on Design (ISCA-25), 1998.
[30] Y. Sazeides and J. Smith, “The Predictability of Data Values,” Proc. 30th Ann. Int'l Symp. Microarchitecture (MICRO '30), pp. 248-258, Dec. 1997.
[31] Y. Sazeides and J. Smith, “Implementations of Context-Based Value Predictors,” Technical Report ECE-TR-97-8, Univ. of Wisconsin, Dec. 1997.
[32] K. Wang and M. Franklin, Highly Accurate Data Value Prediction Using Hybrid Predictors Proc. 30th Int'l Symp. Microarchitecture, 1997.
[33] T.-Y. Yeh and Y.N. Patt, ``Two-Level Adaptive Branch Prediction,'' Proc. 24th Ann. ACM/IEEE Int'l Symp. Microarchitecture, pp. 51-61, 1991.

Index Terms:
Value prediction, trace cache, Instruction Level Parallelism, data dependences, dynamic classification.
Citation:
Sang-Jeong Lee, Pen-Chung Yew, "On Augmenting Trace Cache for High-Bandwidth Value Prediction," IEEE Transactions on Computers, vol. 51, no. 9, pp. 1074-1088, Sept. 2002, doi:10.1109/TC.2002.1032626
Usage of this product signifies your acceptance of the Terms of Use.