This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Extending Value Reuse to Basic Blocks with Compiler Support
April 2000 (vol. 49 no. 4)
pp. 331-347

Abstract—Speculative execution and instruction reuse are two important strategies that have been investigated for improving processor performance. Value prediction at the instruction level has been introduced to allow even more aggressive speculation and reuse than previous techniques. This study suggests that using compiler support to extend value reuse to a coarser granularity than a single instruction, such as a basic block, may have substantial performance benefits. We investigate the input and output values of basic blocks and find that these values can be quite regular and predictable. For the SPEC benchmark programs evaluated, 90 percent of the basic blocks have fewer than four register inputs, five live register outputs, four memory inputs, and two memory outputs. About 16 to 41 percent of all the basic blocks are simply repeating earlier calculations when the programs are compiled with the -O2 optimization level in the GCC compiler. Compiler optimizations, such as loop-unrolling and function inlining, affect the sizes of basic blocks, but have no significant or consistent impact on their value locality, nor the resulting performance. Based on these results, we evaluate the potential benefit of basic block reuse using a novel mechanism called the block history buffer. This mechanism records input and live output values of basic blocks to provide value reuse at the basic block level. Simulation results show that using a reasonably sized block history buffer to provide basic block reuse in a 4-way issue superscalar processor can improve execution time for the tested SPEC programs by 1 to 14 percent, with an overall average of 9 percent when using reasonable hardware assumptions.

[1] D. Burger, T. Austin, and S. Bennett, “The Simplescalar Tool Set, Version 2.0,” Technical Report 1342, Computer Science Dept., Univ. of Wisconsin, Madison, June 1997.
[2] M. Franklin, “Multiscalar Processors,” PhD thesis, Univ. of Wisconsin, 1993.
[3] A. Gonzalez and M. Valero, “Virtual Physical Registers,” Proc. Fourth Int'l Symp. High Performance Computer Architecture (HPCA-4), pp. 175-184, Feb. 1998.
[4] J. Huang and D. Lilja, “Exploiting Basic Block Value Locality with Block Reuse,” Proc. Fifth Int'l Symp. High Performance Computer Architecture (HPCA-5), pp.106-114, Jan. 1999.
[5] M.H. Lipasti, C.B. Wilkerson, and J.P. Shen, "Value Locality and Load Value Prediction," Proc. Seventh Int'l Conf. on Architectural Support for Programming Languages and Operating Systems, ACM Press, New York, 1996, pp. 138-147.
[6] M.H. Lipasti and J.P. Shen, "Exceeding the Data-Flow Limit Via Value Prediction," Proc. 29th Ann. ACM/IEEE Int'l Symp. on Microarchitecture, IEEE CS Press, Los Alamitos, Calif., 1996, pp. 226-237.
[7] M. Lipasti and J. Shen, “Superspeculative Microarchitecture for Beyond AD 2000,” Computer, vol. 30, no. 9, pp. 59-66, Sept. 1997.
[8] S. Melvin and Y. Patt, “Enhancing Instruction Scheduling with a Block-Structured ISA,” Int'l J. Parallel Programming, vol. 23, no. 3, pp. 221-243, June 1995.
[9] S.S. Muchnick, Advanced Compiler Design and Implementation, Morgan Kaufmann, San Francisco, Calif., 1997.
[10] K. Olukotun et al., "The Case for a Single-Chip Multiprocessor," Proc. Int'l Conf. Architectural Support for Programming Languages and Operating Systems, ACM, 1996, pp. 2-11.
[11] S. Richardson, “Caching Function Results: Faster Arithmetic by Avoiding Unnecessary Computation,” Technical Report SMLI TR-92-1, Sun Microsystems Laboratories, Sept. 1992.
[12] Y. Sazeides and J. Smith, “The Predictability of Data Values,” Proc. 30th Ann. Int'l Symp. Microarchitecture (MICRO '30), pp. 248-258, Dec. 1997.
[13] J. Smith and S. Vajapeyam, “Trace Processors: Moving to Fourth Generation Microarchitectures,” Computer, vol. 30, no. 9, pp. 68-74, Sept. 1997.
[14] A. Sodani and G.S. Sohi, “Dynamic Instruction Reuse,” Proc. 24th Ann. Int'l Symp. Computer Architecture, 1997.
[15] A. Sodani and G.S. Sohi, “An Empirical Analysis of Instruction Repetition,” Proc. Eighth Int'l Symp. Architectural Support for Programming Languages and Operating Systems, pp. 35-45, Oct. 1998.
[16] J. Steffan and T. Mowry, The Potential of Using Thread-Level Data Speculation to Facilitate Automatic Parallelization Proc. Fourth Int'l Symp. High-Performance Computer Architecture, pp. 2-13, 1998.
[17] J.Y. Tsai, J. Huang, C. Amlo, D.J. Lilja, and P.C. Yew, “The Superthreaded Processor Architecture,” IEEE Trans. Computers, vol. 48, no. 9, Sept. 1999
[18] J.-Y. Tsai, Z. Jiang, E. Ness, and P.-C. Yew, “Performance Study of a Concurrent Multithreaded Processor,” Proc. Fourth Int'l Symp. High Performance Computer Architecture (HPCA-4), pp. 24-33, 31 Jan.-4 Feb. 1998.
[19] G. Tyson and T. Austin, “Improving the Accuracy and Performance of Memory Communication through Renaming,” Proc. 30th Ann. Int'l Symp. Microarchitecture (MICRO '30), pp. 218-227, Dec. 1997.
[20] S. Vajapeyam and T. Mitra, “Improving Superscalar Instruction Dispatch and Issue by Exploiting Dynamic Code Sequences,” Proc. 24th Int'l Symp. Computer Architecture (ISCA), pp. 2-13, June 1997.
[21] K. Wang and M. Franklin, Highly Accurate Data Value Prediction Using Hybrid Predictors Proc. 30th Int'l Symp. Microarchitecture, 1997.

Index Terms:
Block history buffer, block reuse, compiler flow analysis, value locality, value reuse.
Citation:
Jian Huang, David J. Lilja, "Extending Value Reuse to Basic Blocks with Compiler Support," IEEE Transactions on Computers, vol. 49, no. 4, pp. 331-347, April 2000, doi:10.1109/12.844346
Usage of this product signifies your acceptance of the Terms of Use.