This Article 
 Bibliographic References 
 Add to: 
Efficient Instruction Sequencing with Inline Target Insertion
December 1992 (vol. 41 no. 12)
pp. 1537-1551

Inline target insertion, a specific compiler and pipeline implementation method for delayed branches with squashing, is defined. The method is shown to offer two important features not discovered in previous studies. First, branches inserted into branch slots are correctly executed. Second, the execution returns correctly from interrupts or exceptions with only one program counter. These two features result in better performance and less software/hardware complexity than conventional delayed branching mechanisms.

[1] Advanced Micro Devices, "AM29000 streamlined instruction processor, advance information," Publication No. 09075, Rev. A. Sunnyvale, CA.
[2] S.R. Young et al., "High-Level Knowledge Sources in Usable Speech Recognition Systems,"Comm. ACM, Vol. 32, Feb. 1989, pp. 183-194.
[3] J. S. Birnbaum and W. S. Worley, "Beyond RISC: High precision architecture," inProc. Spring COMPCON, 1986.
[4] P. P. Chang and W. W. Hwu, "Trace selection for compiling large C application programs to microcode," inProc. 21st Annu. Workshop Microprogramming and Microarchitectures, Nov. 1988, pp. 21-29.
[5] P. P. Chang and W. W. Hwu, "Forward semantic: A compiler-assisted instruction fetch method for heavily pipelined processors," inProc. 22nd Annu. Int. Workshop Microprogramming and Microarchitecture, Aug. 1989, pp. 188-198.
[6] P. P. Chang and W. W. Hwu, "Control flow optimization for supercomputer scalar processing," inProc. 1989 Int. Conf. Supercomput., June 1989.
[7] P. Chang, "Aggressive code improving techniques based on control flow analysis," M.S. Thesis, Dep. Elec. and Comput. Eng., Univ. of Illinois, Urbana-Champaign, Advisor W. W. Hwu, 1989.
[8] P. Chow and M. Horowitz, "Architectureal trade-offs in the design of MIPS-X," inProc. 14th Int. Symp. Comput. Architecture, June 1987, pp. 300-308.
[9] J. A. DeRosa and H. M. Levy, "An evaluation of branch architectures," inProc. 15th Int. Symp. Comput. Architecture, May 1988.
[10] D. R. Ditzel and H. R. McLellan, "Branch folding in CRISP microprocessor," inProc. 14th Annu. Symp. Comput. Architecture, June 1987, pp. 2-9.
[11] J.S. Emer and D.W. Clark, "A Characterization of Processor Performance in the VAX- 11/780,"Proc. 11th Symp. Computer Architecture, 1984, IEEE Computer Soc. Press, Los Alamitos, Calif., pp. 301-310.
[12] C. C. Foster and E. M. Riseman, "Percolation of code to enhance parallel dispatching and execution,"IEEE Trans. Comput., vol. C-21, pp. 1411-1415, Dec. 1972.
[13] T. R. Gross and J. Hennessey, "Optimizing delayed branches," inProc. 15th Workshop Microprogramming, 1986.
[14] J. L. Hennessy, N. Jouppi, F. Baskett, and J. Gill, "MIPS: A VLSI processor architecture," inProc. CMU Conf. VLSI Syst. and Computations, Oct. 1981.
[15] M. D. Hill, S. J. Eggers, J. R. Larus, G. S. Taylor, G. Adams, B. K. Bose, G.A. Gibson, P. M. Hansen, J. Keller, S. I. Kong, C. G. Lee, D. Lee, J. M. Pendleton, S.A. Ritchie, D. A. Wood, B. G. Zom, P. N. Hilfinger, D. Hodges, R. H. Katz, J. Ousterhout, and D.A. Patterson, "SPUR: A VLSI multiprocessor workstation,"IEEE Comput. Mag., vol. 19, pp. 8-22, Nov. 1986.
[16] R. W. Horst, R. L. Harris, and R. L. Jardine, "Multiple instruction issue in the nonstop cyclone processor," inProc. Int. Symp. Comput. Architecture, May 1990.
[17] P. Y. T. Hsu and E. S. Davidson, "Highly concurrent scalar processing," inProc. 13th Annu. Symp. Comput. Architecture, June 1986, pp. 386-395.
[18] W.M. Hwu and Y.N. Patt, "Checkpoint Repair for High-Performance Out-of-Order Execution Machines,"IEEE Trans. Computers, Vol. 36, No. 12, Dec. 1987, pp. 1496-1514.
[19] W. W. Hwu, "Exploiting concurrency to achieve high performance in a single-chip microarchitecture," Ph.D. dissertation, Comput. Sci. Division Rep., UCB/CSD 88/398, Univ. California, Berkeley, Jan. 1988.
[20] W. W. Hwu, T. M. Conte, and P. P. Chang, "Comparing software and hardware schemes for reducing the cost of branches," inProc. 16th Annu. Int. Symp. Comput. Architecture, May 1989, pp. 224-231.
[21] W. W. Hwu and P. P. Chang, "Inline function expansion for compiling C Programs,"SIGPLAN Not., vol. 24, no. 7, pp. 246-257, 1989.
[22] W. W. Hwu and P. P. Chang, "Efficient instruction sequencing with inline target insertion," Tech. Rep. CSG-103, Center for Reliable and High-Performance Computing, Univ. of Illinois, Urbana-Champaign, 1990.
[23] Intel, "i860(TM) 64-bit Microprocessor," Order 240296-002, Santa Clara, CA, Apr. 1989.
[24] N.P. Jouppi and D.W. Wall, "Available Instruction-Level Parallelism for Superpipelined and Superscalar Machines,"Third Int'l Conf. Architectural Support for Programming Languages and Operating Systems, IEEE CS Press, Los Alamitos, Calif., Order No. 1936, 1989, pp. 272-282.
[25] G. Kane,MIPS R2000 RISC Architecture. Englewood Cliffs, NJ: Prentice-Hall, 1987.
[26] P. M. Kogge,The Architecture of Pipelined Computers. New York: McGraw-Hill, 1981, pp. 237-243.
[27] D. J. Kuck, Y. Muraoka, and S. Chen, "On the number of operations simultaneously executable in Fortran-like programs and their resulting speedup,"IEEE Trans. Comput., vol. C-21, pp. 1293-1310, Dec. 1972.
[28] J. K. F. Lee and A. J. Smith, "Branch prediction strategies and branch target buffer design,"IEEE Comput. Mag., pp. 6-22, Jan. 1984.
[29] S. McFarling and I. Hennessey, "Reducing the cost of branches," inProc. 13th Annu. Symp. Comput. Architecture, June 1986, pp. 396-403.
[30] C. Melear, "The design of the 88000 RISC family,"IEEE Micro, pp. 26-38, Apr. 1989.
[31] A. Nicolau and J. A. Fisher, "Measuring the parallelism available for very long instruction word architectures,"IEEE Trans. Comput., vol. C- 33, no. 11, pp. 968-976, Nov. 1984.
[32] Y. Patt, W. Hwu, and M. Shebanow, "HPS, A new microarchitecture: Rationale and introduction," inProc. MICRO-18, ACM, Dec. 1985, pp. 103-108.
[33] D. A. Patterson and C. H. Sequin, "A VLSI RISC,"IEEE Comput. Mag., pp. 8-21, Sept. 1982.
[34] A. R. Pleszkun, J. R. Goodman, W.-C. Hsu, R. T. Joersz, G. Bier, P. Woest, and P. B. Schechter, "WISQ: A restartable architecture using queues," inProc. 14th Int. Symp. Comput. Architecture Conf., June 1987, pp. 290-299.
[35] A. R. Pleszkun and G. S. Sohi, "Multiple instruction issue and single-chip processors," inProc. 21st Annu. Workshop Microprogramming and Microarchitecture, Nov. 1988.
[36] A. R. Pleszkun and G. S. Sohi, "The performance potential of multiple functional unit processors," inProc. 15th Annu. Int. Symp. Comput. Architecture, May 1988, pp. 37-44.
[37] G. Radin, "The 801 Minicomputer,"Proc. Symp. Architectural Support for Programming Languages and Operating Systems, Association for Computing Machinery, New York, Mar. 1982, pp. 39-47.
[38] R. Rudell, "Espresso-MV: Algorithms for multiple-valued logic minimization," inProc. Cust. Int. Circ. Conf., May 1985.
[39] J. E. Smith, "A study of branch prediction strategies," inProc. 8th Annu. Symp. Comput. Architecture, May 1981.
[40] J.E. Smith and A.R. Pleszkun, "Implementation of Precise Interrupts in Pipelined Processors,"Proc. 12th Ann. Int'l Symp. on Computer Arch., Boston, June 1985, pp. 36-44.
[41] M. Smith, M. Johnson, and M. Horowitz, "Limits on Multiple Instruction Issue,"Symp. Architectural Support Programming Languages and Operating Systems, IEEE CS Press, Los Alamitos, CA, Order No. 1,936, 1989, pp. 290-302.
[42] G. S. Sohi, and S. Vajapeyam, "Tradeoffs in instruction format design for horizontal architectures," inProc. Third. Int. Conf. Architectural Support for Programming Languages and Oper. Syst., Apr. 1989, pp. 15-25.
[43] SUN Microsystems,The SPARC(TM) Architecture Manual, SUN Microsystems, Part No. 800-1399-07, Revision 50, Mountain View, CA, Aug. 1987.
[44] G. S. Tjaden and M. J. Flynn, "Detection and parallel execution of independent instructions,"IEEE Trans. Comput., vol. C-19, no. 10, pp. 889-895, Oct. 1970.
[45] R. D. Acosta, J. Kjelstrup, and H. C. Torng, "An instruction issuing approach to enhancing performance in multiple functional unit processors,"IEEE Trans. Comput., vol. C-35, pp. 815-828, Sept. 1986.
[46] R. M. Tomasulo, "An efficient algorithm for exploiting multiple arithmetic units,"IBM J. Res. Develop., vol. 11, pp. 25-33, Jan. 1967.
[47] S. Weiss and J. E. Smith, "Instruction issue logic in pipelined supercomputers,"IEEE Trans. Comput., vol. C-33, pp. 1013-1022, Nov. 1984.

Index Terms:
instruction sequencing; inline target insertion; compiler; pipeline; delayed branches; squashing; branch slots; interrupts; exceptions; program counter; parallel programming; pipeline processing; program compilers.
W.W. Hwu, P.P. Chang, "Efficient Instruction Sequencing with Inline Target Insertion," IEEE Transactions on Computers, vol. 41, no. 12, pp. 1537-1551, Dec. 1992, doi:10.1109/12.214662
Usage of this product signifies your acceptance of the Terms of Use.