This Article 
 Bibliographic References 
 Add to: 
Three Architectural Models for Compiler-Controlled Speculative Execution
April 1995 (vol. 44 no. 4)
pp. 481-494

Abstract—To effectively exploit instruction level parallelism, the compiler must move instructions across branches. When an instruction is moved above a branch that it is control dependent on, it is considered to be speculatively executed since it is executed before it is known whether or not its result is needed. There are potential hazards when speculatively executing instructions. If these hazards can be eliminated, the compiler can more aggressively schedule the code. The hazards of speculative execution are outlined in this paper. Three architectural models: restricted, general, and boosting, which have increasing amounts of support for removing these hazards are discussed. The performance gained by each level of additional hardware support is analyzed using the IMPACT C compiler which performs superblock scheduling for superscalar and superpipelined processors.

[1] E.M. Riseman and C.C. Foster,“The inhibition of potential parallelism by conditional jumps,” IEEE Trans. Computers, vol. 21, pp. 1405-1411, Dec. 1972.
[2] M.D. Smith, M. Johnson, and M. Horowitz, “Limits on Multiple Instruction Issue,” Proc. Third Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 290-302, Apr. 1989.
[3] N.P. Jouppi and D.W. Wall,"Available Instruction-Level Parallelism for Superscalar and Superpipelined Machines," Proc. Third Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS), Assoc. of Computing Machinery,N.Y., Apr. 1989, pp. 272-282.
[4] J.K.F. Lee and A.J. Smith,“Branch prediction strategies and branch target buffer design,” Computer, Jan. 1984.
[5] K. Hwang and F.A. Briggs,Computer Architecture and Parallel Processing.New York: McGraw Hill, 1984.
[6] P.Y.T. Hsu and E.S. Davidson,“Highly concurrent scalar processing,” Proc. 13th Int’l Symp. Computer Architecture, pp. 386-395, June 1986.
[7] B.R. Rau and C.D. Glaeser,“Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientificcomputing,” Proc. 14th Ann. Workshop Microprogramming, pp. 183-198, Oct. 1981.
[8] K. Ebcioglu,“A compilation technique for software pipelining of loops with conditional jumps,” Proc. 20th Ann. Workshop Microprogramming, pp. 69-79, Dec. 1987.
[9] S. Weiss and J.E. Smith,“A study of scalar compilation techniques for pipelined supercomputers,” Proc. Second Int’l Conf. Architectural Support Programming Languages and Operating Systems, pp. 105-109, Oct. 1987.
[10] M. Lam, "Software Pipelining: An Effective Scheduling Technique for VLIW Machines," Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation, 1988.
[11] W.W. Hwu,T.M. Conte,, and P.P. Chang,“Comparing software and hardware schemes for reducing the cost of branches,” Proc. 16th Int’l Symp. Computer Architecture, pp. 224-233, May 1989.
[12] J.A. Fisher,“Trace scheduling: A technique for global microcode compaction,” IEEE Trans. Computers, vol. 30, pp. 478-490, July 1981.
[13] P.P. Chang and W.W. Hwu,“Trace selection for compiling large C application programs tomicrocode,” Proc. 21st Int’l Microprogramming Workshop, pp. 21-29, Nov. 1988.
[14] P.P. Chang, S.A. Mahlke, and W.W. Hwu, "Using Profile Information to Assisst Classic Code Optimizations," Software—Practice and Experiences, vol. 21, no. 12, pp. 1,301-1,321, 1991.
[15] D. Bernstein and M. Rodeh,“Global instruction scheduling for superscalar machines,” Proc. SIGPLAN 1991 Conf. Programming Language Design and Implementation, pp. 241-255, June 1991.
[16] W.W. Hwu and Y.N. Patt,“Hpsm, a high performance restricted data flow architecture having minimal functionality,” Proc. 13th Int’l Symp. Computer Architecture, pp. 297-306, June 1986.
[17] M.D. Smith,M.S. Lam,, and M.A. Horowitz,“Boosting beyond static scheduling in a superscalar processor,” Proc. 17th Ann. Int’l Symp. Computer Architecture, pp. 344-354, May 1990.
[18] R.P. Colwell et al., "A VLIW Architecture for a Trace Scheduling Compiler," Proc. Second Symp. Architectural Support for Programming Languages and Operating Systems, ACM, 1987, pp. 180-192.
[19] J.R. Goodman and W.-C. Hsu, “Code Scheduling and Register Allocation in Large Basic Blocks,” Conf. Proc. 1988 Int'l Conf. Supercomputing, pp. 442-452, July 1988.
[20] P.P. Chang,D.M. Lavery,S.A. Mahlke,W.Y. Chen,, and W.W. Hwu,“The importance of prepass codes scheduling for superscalar and superpipelined processors,” IEEE Trans. Computers, vol. 44, no. 3, pp. 353-370, Mar. 1995.
[21] W.W. Hwu, S.A. Mahlke, W.Y. Chen, P.P. Chang, N.J. Warter, R.A. Bringmann, R.G. Ouellette, R.E. Hank, T. Kiyohara, G.E. Haab, J.G. Holm,, and D.M. Lavery, ``The Superblock: An Effective Technique for VLIW and Superscalar Compilation,'' J. Supercomputing, vol. 7, pp. 9-50, 1993.
[22] K. Anantha and F. Long,“Code compaction for parallel architectures,” Software Practice and Experience, vol. 20, pp. 537-554, June 1990.
[23] A.V. Aho, R. Sethi, and J.D. Ullman, Compilers, Principles, Techniques and Tools.New York: Addison-Wesley, 1985.
[24] P.P. Chang, S.A. Mahlke, W.Y. Chen, N.J. Warter, and W.W. Hwu, "IMPACT: An Architectural Framework for Multiple-Issue Processors," Proc. 18th Ann. Int'l Symp. Computer Architecture, pp. 276-275,Toronto, Ontario, Canada, May 1991.
[25] S.A. Mahlke, W.Y. Chen, W.-m. Hwu, B.R. Rau, and M.S. Schlansker, “Sentinel Scheduling for VLIW and Superscalar Processors,” Proc. Fifth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 238-247, Oct. 1992.
[26] G. Kane,MIPS R2000 RISC Architecture, Prentice Hall, Englewood Cliffs, N.J., 1987.
[27] W.W. Hwu and P.P. Chang,“Efficient instruction sequencing with inline target insertion,” Tech. Rep. CSG-123, Center for Reliable and High-Performance Computing,Univ. of Illi nois, Urbana, Ill., May 1990.
[28] A. Aiken and A. Nicolau,“A development environment for horizontal microcode,” IEEE Transactions on Software Engineering, vol. 14, no. 5, pp. 584-594, May 1988.
[29] A. Aiken and A Nicolau,“Optimal loop parallelization,” Proc. 1988 ACM SIGPLAN Conf. Programming Language Design and Implementation, pp. 308-317, June 1988.
[30] M.C. Golumbic and V. Rainish,“Instruction scheduling beyond basic blocks,” IBM J. Research and Development, vol. 34, pp. 93-97, Jan. 1990.

Index Terms:
Conditional branches, exception handling, speculative execution, static code scheduling, superblock, superpipelining, superscalar.
Nancy J. Warter, Pohua P. Chang, Scott A. Mahlke, William Y. Chen, Wen-mei W. Hwu, "Three Architectural Models for Compiler-Controlled Speculative Execution," IEEE Transactions on Computers, vol. 44, no. 4, pp. 481-494, April 1995, doi:10.1109/12.376164
Usage of this product signifies your acceptance of the Terms of Use.