The Community for Technology Leaders
Green Image
<p><b>Abstract</b>—Speculative execution and instruction reuse are two important strategies that have been investigated for improving processor performance. Value prediction at the instruction level has been introduced to allow even more aggressive speculation and reuse than previous techniques. This study suggests that using compiler support to extend value reuse to a coarser granularity than a single instruction, such as a basic block, may have substantial performance benefits. We investigate the input and output values of basic blocks and find that these values can be quite regular and predictable. For the SPEC benchmark programs evaluated, 90 percent of the basic blocks have fewer than four register inputs, five live register outputs, four memory inputs, and two memory outputs. About 16 to 41 percent of all the basic blocks are simply repeating earlier calculations when the programs are compiled with the <it>-O2</it> optimization level in the GCC compiler. Compiler optimizations, such as loop-unrolling and function inlining, affect the sizes of basic blocks, but have no significant or consistent impact on their value locality, nor the resulting performance. Based on these results, we evaluate the potential benefit of basic block reuse using a novel mechanism called the <it>block history buffer</it>. This mechanism records input and live output values of basic blocks to provide value reuse at the basic block level. Simulation results show that using a reasonably sized <it>block history buffer</it> to provide basic block reuse in a 4-way issue superscalar processor can improve execution time for the tested SPEC programs by 1 to 14 percent, with an overall average of 9 percent when using reasonable hardware assumptions.</p>
Block history buffer, block reuse, compiler flow analysis, value locality, value reuse.

D. J. Lilja and J. Huang, "Extending Value Reuse to Basic Blocks with Compiler Support," in IEEE Transactions on Computers, vol. 49, no. , pp. 331-347, 2000.
94 ms
(Ver 3.3 (11022016))