• Publication
  • 1996
  • Issue No. 3 - March
  • Abstract - Efficient Exploitation of Instruction-Level Parallelism for Superscalar Processors by the Conjugate Register File Scheme
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Efficient Exploitation of Instruction-Level Parallelism for Superscalar Processors by the Conjugate Register File Scheme
March 1996 (vol. 45 no. 3)
pp. 278-293

Abstract—This paper introduces a novel superscalar micro-architecture, called IAS-S, and its related software techniques. We treat two basic problems in superscalar machines. First, we seek a feasible hardware platform which allows the compiler to perform more aggressive instruction scheduling. Second, we develop a good way of communication between the instruction scheduler and register allocator to avoid inadequate register allocation resulting in poor instruction schedules. For the first part, IAS-S employs the Conjugate Register File (CRF) scheme to support multilevel instruction boosting so that a greater amount of instruction-level parallelism in a program can be identified at compile time. For the second part, the instruction scheduling in the IAS-S compiler consists of two passes, prepass and postpass, and a scheduling-conflict graph is built for the register allocator during the prepass scheduling. In this manner, the register allocator can take the potential benefit for later postpass instruction scheduling into account and thus prevent inadequate register allocation.

[1] A.V. Aho, R. Sethi, and J.D. Ullman, Compilers, Principles, Techniques and Tools.New York: Addison-Wesley, 1985.
[2] D.B. Bernstein,M.C. Golumbic,Y. Mansour,R.Y. Pinter,D.Q. Goldin,H. Krawczyk, and I. Nahshon,"Spill Code Minimization Techniques for Optimizing Compilers," Proc. SIGPLAN '89 Conf. Programming Language Design and Implementation, pp. 258-263, June 1989.
[3] D.G. Bradlee, S.J. Eggers, and R.R. Henry, “Integrating Register Allocation and Instruction Scheduling for RISCs,” Proc. Fourth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 122-131, Apr. 1991.
[4] P. Briggs,K.D. Cooper,K. Kennedy, and L. Torczon,"Coloring Heuristics for Register Allocation," Proc. SIGPLAN '89 Conf. Programming Language Design and Implementation, pp. 275-284, June 1989.
[5] G.J. Chaitin,M.A. Auslander,A.K. Chandra,J. Cocke,M.E. Hopkins, and P. Markstein,"Register Allocation Via Coloring," Computer Languages, vol. 6, pp. 47-57, 1981.
[6] G. Chaitin, "Register Allocation and Spilling via Graph Coloring," Proc. SIGPLAN 82 Symp. Compiler Construction, ACM Press, Vol. 17, No. 6, June 1982, pp. 98-105.
[7] P.P. Chang, S.A. Mahlke, W.Y. Chen, N.J. Warter, and W.W. Hwu, "IMPACT: An Architectural Framework for Multiple-Issue Processors," Proc. 18th Ann. Int'l Symp. Computer Architecture, pp. 276-275,Toronto, Ontario, Canada, May 1991.
[8] P.P. Chang,D.M. Lavery, and W.W. Hwu,"The Importance of Prepass Code Scheduling for Superscalar and Superpipelined Processors," Technical Report No. CSRD-1124, Center for Supercomputing Research and Development, Univ. of Illinois at Urbana-Champaign, May 1991.
[9] P.P. Chang, S.A. Mahlke, and W.W. Hwu, "Using Profile Information to Assisst Classic Code Optimizations," Software—Practice and Experiences, vol. 21, no. 12, pp. 1,301-1,321, 1991.
[10] R.P. Colwell, R.P. Nix, J.J. O'Donnell, D.B. Papworth,, and P.K. Rodman, ``A VLIW Architecture for a Trace Scheduling Compiler,'' IEEE Trans. Computers, vol. 37, no. 8, pp. 967-979, Aug. 1988.
[11] J.R. Ellis, Bulldog: A Compiler for VLIW Architectures.Cambridge, Mass.: MIT Press, 1986.
[12] J.A. Fisher,"Trace Scheduling: A Technique for Global Microcode Compaction," IEEE Trans. Computers, vol. 30, no. 7, pp. 478-490, July 1981.
[13] J.R. Goodman and W.-C. Hsu, “Code Scheduling and Register Allocation in Large Basic Blocks,” Conf. Proc. 1988 Int'l Conf. Supercomputing, pp. 442-452, July 1988.
[14] J. Hennessy and M. Ganapathi,"Advances in Compiler Technology," Ann. Rev. Computer Science, pp. 1:83-106, 1986.
[15] P.Y.T. Hsu and E.S. Davidson,“Highly concurrent scalar processing,” Proc. 13th Int’l Symp. Computer Architecture, pp. 386-395, June 1986.
[16] M. Johnson,"Super-Scalar Processor Design," Technical Report No. CSL-TR-89-383, Stanford Univ., June 1989.
[17] M.C. Lam,"Instruction Scheduling for Superscalar Architectures," Ann. Rev. Computer Science, pp. 4:173-201, 1990.
[18] S.A. Mahlke, D.C. Lin, W.Y. Chen, R.E. Hank, R.A. Bringmann, and W.W. Hwu, “Effective Compiler Support for Predicated Execution Using the Hyperblock,” Proc. 25th Ann. ACM/IEEE Int'l Symp. Microarchitecture, pp. 45-54, 1992.
[19] S.A. Mahlke, W.Y. Chen, W.-m. Hwu, B.R. Rau, and M.S. Schlansker, “Sentinel Scheduling for VLIW and Superscalar Processors,” Proc. Fifth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 238-247, Oct. 1992.
[20] S.A. Mahlke, W.Y. Chen, R.A. Bringmann, R.E. Hank, W.W. Hwu, B.R. Rau, and M.S. Schlansker, "Sentinel Scheduling: A Model for Compiler-Controlled Speculative Execution," ACM Trans. Computer Systems, vol. 11, no. 4, pp. 376-408, Nov. 1993.
[21] K. Murakami,N. Irie,M. Kuga, and S. Tomita,"SIMP (Single Instruction Stream/Multiple Instruction Pipelining): A Novel High-Speed Single-Processor Architecture," Proc. 16th Ann. Int'l Symp. Computer Architecture, pp. 78-85, May 1989.
[22] Y.N. Patt,S.W. Melvin,W.W. Hwu, and M. Shebanow,"Criti-cal Issues Regarding HPS, High Performance Microarchitecture," Proc. 18th Ann. Workshop Microprogramming, pp. 109-116,Pacific Grove, Calif., Dec. 1985.
[23] J.L. Hennessy and D.A. Patterson, Computer Architecture: A Quantitative Approach, Morgan Kaufmann, San Mateo, Calif., 1990.
[24] J.E. Smith and A. Pleszkun, "Implementation of Precise Interrupts in Pipelined Processors," Proc. 12th Ann. Int'l Symp. Computer Architecture,Boston, June 1985.
[25] M.D. Smith, M. Johnson, and M. Horowitz, “Limits on Multiple Instruction Issue,” Proc. Third Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 290-302, Apr. 1989.
[26] M.D. Smith,M.S. Lam,, and M.A. Horowitz,“Boosting beyond static scheduling in a superscalar processor,” Proc. 17th Ann. Int’l Symp. Computer Architecture, pp. 344-354, May 1990.
[27] M.D. Smith, M.A. Horowitz, and M.S. Lam, "Efficient Superscalar Performance Through Boosting," Proc. Fifth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 248-259,Boston, Oct. 1992.
[28] G.S. Sohi and A.S. Vajapeyam, “Instruction Issue Logic for High-Performance, Interruptible Pipelined Processors,” Proc. 14th Ann. Int'l Symp. Computer Architecture, pp. 27-34, June 1987.
[29] R.M. Tomasulo,"An Efficient Hardware Algorithm for Exploiting Multiple Arithmetic Units," IBM J., pp. 25-33, Jan. 1967.
[30] D.W. Wall, “Limits of Instruction-Level Parallelism,” Proc. Fourth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 176-188, 8-11 Apr. 1991.

Index Terms:
Instruction-level parallelism, speculative execution, superscalar processors, multilevel boosting, shadow register file, conjugate register file, scheduling-conflict graph.
Citation:
Meng-chou Chang, Feipei Lai, "Efficient Exploitation of Instruction-Level Parallelism for Superscalar Processors by the Conjugate Register File Scheme," IEEE Transactions on Computers, vol. 45, no. 3, pp. 278-293, March 1996, doi:10.1109/12.485567
Usage of this product signifies your acceptance of the Terms of Use.