Issue No. 03 - March (1993 vol. 42)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/12.210179
<p>A mechanism to reduce the cost of branches in pipelined processors is described and evaluated. It is based on the use of multiple prefetch, early computation of the target address, delayed branch, and parallel execution of branches. The implementation of this mechanism using a branch target instruction memory is described. An analytical model of the performance of this implementation makes it possible to measure the efficiency of the mechanism with a very low computational cost. The model is used to determine the size of cache lines that maximizes the processor performance, to compare the performance of the mechanism with that of other schemes, and to analyze the performance of the mechanism with two alternative cache organizations.</p>
branch delay; pipelined processors; multiple prefetch; early computation; target address; delayed branch; parallel execution; branch target instruction memory; performance; computational cost; cache lines; buffer storage; performance evaluation; pipeline processing.
A.M. Gonzalez, J.M. Llaberia, "Reducing Branch Delay to Zero in Pipelined Processors", IEEE Transactions on Computers, vol. 42, no. , pp. 363-371, March 1993, doi:10.1109/12.210179