This Article 
 Bibliographic References 
 Add to: 
Reducing Branch Delay to Zero in Pipelined Processors
March 1993 (vol. 42 no. 3)
pp. 363-371

A mechanism to reduce the cost of branches in pipelined processors is described and evaluated. It is based on the use of multiple prefetch, early computation of the target address, delayed branch, and parallel execution of branches. The implementation of this mechanism using a branch target instruction memory is described. An analytical model of the performance of this implementation makes it possible to measure the efficiency of the mechanism with a very low computational cost. The model is used to determine the size of cache lines that maximizes the processor performance, to compare the performance of the mechanism with that of other schemes, and to analyze the performance of the mechanism with two alternative cache organizations.

[1] J. Cortadella and J. M. Llabería, "Low cost evaluation methodology for new architectures," inProc. IASTED Int. Symp. Appl. Informatics, Feb. 1987, pp. 192-195.
[2] J. H. Crawford, "The i486 CPU: Executing instruction in one clock cycle,"IEEE Micro, vol. 10, no. 1, pp. 27-36, Feb. 1990.
[3] R. W. Edenfield, "The 68040 Processor. Part 1, Design and implementation,"IEEE Micro, vol. 10, no. 1, pp. 66-78, Feb. 1990.
[4] R.B. Garner et al., "The Scalable Processor Architecture (Sparc),"Proc. 26th Compcon, 1988, pp. 278-283.
[5] A. González, "Designing an instruction cache for reducing the cost of branches," Rese. Rep. UPC/DAC RR-91/02, Comput. Architecture Dep., Polythecnic Univ. of Catalonia, Barcelona, Jan. 1991.
[6] A. González and J.M. Llabería, "Instruction fetch unit for parallel execution of branch instructions," inProc. 3rd Int. Conf. Supercomput., ACM SIGARCH ICS-89, June 1989, pp. 417-426.
[7] A. González, J. M. Llabería, and J. Cortadella, "Zero-delay cost branches in RISC architectures," inProc. IASTED Int. Symp. Appl. Informatics, Feb. 1988, pp. 24-27.
[8] A. González, J. M. Llabería, and J. Cortadella, "A mechanism for reducing the cost of branches in RISC architectures,"Microprocessing and Microprogramming, vol. 24, no. 1-5, pp. 565-572, Aug. 1988.
[9] G. Grohoski, "Machine Organization ofthe IBM RISC System/6000 Processor,"IBM J. Research and Development, Vol. 34, No. 1, Jan. 1990, pp. 37-58.
[10] T. R. Gross and J. Hennessey, "Optimizing delayed branches," inProc. 15th Workshop Microprogramming, 1986.
[11] G. Hinton, "80960 -- Next generation," inProc. 34th. IEEE Comput. Society Conf. COMPCON'89, Feb. 1989, pp. 13-17.
[12] M. Johnson, "System considerations in the design of the Am29000,"IEEE Micro, vol. 7, no. 4, pp. 29-41, Aug. 1987.
[13] M. G. H. Katevenis,Reduced Instruction Set Computer Architectures for VLSI. Cambridge, MA: M.I.T. Press, 1985.
[14] D. L. Lilja, "Reducing the branch penalty in pipelined processors,"IEEE Comput. Mag., vol. 21, no. 7, pp. 47-55, July 1988.
[15] S. McFarling and I. Hennessey, "Reducing the cost of branches," inProc. 13th Annu. Symp. Comput. Architecture, June 1986, pp. 396-403.
[16] T. Riordanet al., "System design using the MIPS R3000/3010 RISC Chipset," inProc. 34th IEEE Comput. Soc. Conf., COMPCON'89, Feb. 1989, pp. 494-498.

Index Terms:
branch delay; pipelined processors; multiple prefetch; early computation; target address; delayed branch; parallel execution; branch target instruction memory; performance; computational cost; cache lines; buffer storage; performance evaluation; pipeline processing.
A.M. Gonzalez, J.M. Llaberia, "Reducing Branch Delay to Zero in Pipelined Processors," IEEE Transactions on Computers, vol. 42, no. 3, pp. 363-371, March 1993, doi:10.1109/12.210179
Usage of this product signifies your acceptance of the Terms of Use.