This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Evaluating the Use of Register Queues in Software Pipelined Loops
August 2001 (vol. 50 no. 8)
pp. 769-783

Abstract—In this paper, we examine the effectiveness of a new hardware mechanism, called Register Queues (RQs), which effectively decouples the architected register space from the physical registers. Using RQs, the compiler can allocate physical registers to store live values in the software pipelined loop while minimizing the pressure placed on architected registers. We show that decoupling the architected register space from the physical register space can greatly increase the applicability of software pipelining, even as memory latencies increase. RQs combine the major aspects of existing rotating register file and register connection techniques to generate efficient software pipeline schedules. Through the use of RQs, we can minimize the register pressure and code expansion caused by software pipelining. We demonstrate the effect of incorporating register queues and software pipelining with 983 loops taken from the Perfect Club, the SPEC suites, and the Livermore Kernels.

[1] “IA-64 Application Developer's Architecture Guide, Rev 1.0,” Intel Document 245188, http://developer.intel.com/designia64/, 1986.
[2] G. Beck, D. Yen, and T. Anderson, “The Cydra-5 Minisupercomputer: Architecture and Implementation,” J. Supercomputing, vol. 7, no. 1, pp. 143-180, May 1993.
[3] A. Charlesworth, “An Approach to Scientific Array Processing: The Architectural Design of the ap-120b/fps-164 Family,” Computer, vol. 14, no. 9, pp. 18-27, Sept. 1981.
[4] R.G. Cytron, “Compiler-Time Scheduling and Optimization for Asynchronous Machines,” Dept. of Computer Science Report UIUCDCS-R-84-1177, Univ. of Illinois at Urbana-Champaign, 1984.
[5] A.E. Eichenberger, E. Davidson, and S.G. Abraham, “Minimum Register Requirements for a Modulo Schedule,” Proc. 27th Int'l Symp. Microarchitecture, pp. 75-84, Nov. 1994.
[6] A.E. Eichenberger and E.S. Davidson, “Stage Scheduling: A Technique to Reduce the Register Requirements of a Modulo Schedule,” Proc. 28th Ann. Int'l Symp. Microarchitecture, pp. 338-349, Nov. 1995.
[7] M. Fernandes, J. Llosa, and N. Topham, “Partitioned Schedules for Clustered VLIW Architectures,” Proc. 12th Int'l Parallel Processing Symp., pp. 386-391, Mar. 1998.
[8] M.A. Fernandes, “Clustered VLIW Architecture Based on Queue Register File,” Dept. of Computer Science, Univ. of Edinburgh.
[9] R. Govindarajan, E.R. Altman, and G.R. Gao, “Minimizing Register Requirements under Resource-Constrained Rate-Optimal Software Pipelining,” Proc. 27th Int'l Symp. Microarchitecture, pp. 85-94, Nov. 1994.
[10] P.Y.-T. Hsu, “Highly Concurrent Scalar Processing,” PhD Dissertation, Dept. of Computer Science, Univ. of Illinois at Urbana-Champaign, 1986.
[11] R.A. Huff, “Lifetime-Sensitive Modulo Scheduling,” Proc. ACM SIGPLAN '93 Conf. Programming Language Design and Implementation, pp. 258-267, June 1993.
[12] V. Kathail, M. Schlansker, and B.R. Rau, “HPL PlayDoh Architecture Specifications: Version 1.0,” HP Laboratories Technical Report HPL-93-80, 1994.
[13] R. Keller, “Lookahead Processors,” ACM Computing Surveys, pp. 177-195, Dec. 1975.
[14] K. Kiyohara, S.A. Mahlke, W.Y. Chen, R.A. Bringmann, R.E. Hank, S. Anik, and W.W. Hwu, “Register Connection: A New Approach to Adding Registers into Instruction Set Architectures,” Proc. 20th Ann. Int'l Symp. Computer Architecture, pp. 247-256, May 1993.
[15] M. Lam, “Software Pipelining: An Effective Scheduling Technique for VLIW Machines,” Proc. ACM SIGPLAN '88 Conf. Programming Language Design and Implementation, pp. 318-327, 1988.
[16] J. Llosa, M. Valero, J. Fortes, and E. Ayguade, “Using Sacks to Organize Registers in VLIW Machines,” Proc. Int'l Conf. Parallel and Vector Processing, Sept. 1994.
[17] W. Mangione-Smith, S.G. Abraham, and E.S. Davidson, “Register Requirements of Pipelined Processors,” Proc. Int'l Conf. Supercomputing, pp. 260-271, July 1992.
[18] B.R. Rau, “Iterative Modulo Scheduling: An Algorithm for Software Pipelining Loops: An Algorithm for Software Pipelining Loops,” Proc. 27th Ann. Int'l Symp. Microarchitecture, pp. 63-74, Nov. 1994.
[19] B.R. Rau, C.D. Glaeser, and R.L. Picard, “Efficient Code Generation for Horizontal Architectures: Computer Techniques and Architectural Support,” Proc. Ninth Ann. Int'l Symp. Computer Architecture, pp. 131-139, 1982.
[20] B.R. Rau, P.J. Kuekes, and C.D. Glaeser, A Statically Scheduled VLSI Interconnect for Parallel Processors. Computer Science Press, 1981.
[21] B.R. Rau, M. Lee, P.P. Tirumalai, and M.S. Schlansker, “Register Allocation for Software Pipelined Loops,” Proc. ACM SIGPLAN '92 Conf. Programming Language Design and Implementation, pp. 283-299, June 1992.
[22] B.R. Rau, D.W.L. Yen, and R.A. Towle, “The Cydra 5 Departmental Supercomputer,” Computer, vol. 22, no. 1, pp. 12-34, Jan. 1989.
[23] J.E. Smith, “Decoupled Access/Execute Computer Architectures,” Proc. Ninth Ann. Int'l Symp. Computer Architecture, pp. 112-119, June 1982.
[24] G.S. Tyson, “Evaluation of a Scalable Decoupled Microprocessor Design,” PhD dissertation, Univ. of California, Davis, 1997.
[25] W.A. Wulf, “Evaluation of the WM Architecture,” Proc. 19th Ann. Int'l Symp. Computer Architecture, pp. 382-390, May 1992.

Index Terms:
Software pipelining, modulo variable expansion, rotating register file, register queues, VLIW, register connection.
Citation:
G.X. Tyson, M. Smelyanskyi, E.S. Davidson, "Evaluating the Use of Register Queues in Software Pipelined Loops," IEEE Transactions on Computers, vol. 50, no. 8, pp. 769-783, Aug. 2001, doi:10.1109/TC.2001.947006
Usage of this product signifies your acceptance of the Terms of Use.