|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
| ASCII Text | x | ||
| G.X. Tyson, M. Smelyanskyi, E.S. Davidson, "Evaluating the Use of Register Queues in Software Pipelined Loops," IEEE Transactions on Computers, vol. 50, no. 8, pp. 769-783, August, 2001. | |||
| BibTex | x | ||
| @article{ 10.1109/TC.2001.947006, author = {G.X. Tyson and M. Smelyanskyi and E.S. Davidson}, title = {Evaluating the Use of Register Queues in Software Pipelined Loops}, journal ={IEEE Transactions on Computers}, volume = {50}, number = {8}, issn = {0018-9340}, year = {2001}, pages = {769-783}, doi = {http://doi.ieeecomputersociety.org/10.1109/TC.2001.947006}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - JOUR JO - IEEE Transactions on Computers TI - Evaluating the Use of Register Queues in Software Pipelined Loops IS - 8 SN - 0018-9340 SP769 EP783 EPD - 769-783 A1 - G.X. Tyson, A1 - M. Smelyanskyi, A1 - E.S. Davidson, PY - 2001 KW - Software pipelining KW - modulo variable expansion KW - rotating register file KW - register queues KW - VLIW KW - register connection. VL - 50 JA - IEEE Transactions on Computers ER - | |||
Abstract—In this paper, we examine the effectiveness of a new hardware mechanism, called Register Queues (RQs), which effectively decouples the architected register space from the physical registers. Using RQs, the compiler can allocate physical registers to store live values in the software pipelined loop while minimizing the pressure placed on architected registers. We show that decoupling the architected register space from the physical register space can greatly increase the applicability of software pipelining, even as memory latencies increase. RQs combine the major aspects of existing rotating register file and register connection techniques to generate efficient software pipeline schedules. Through the use of RQs, we can minimize the register pressure and code expansion caused by software pipelining. We demonstrate the effect of incorporating register queues and software pipelining with 983 loops taken from the Perfect Club, the SPEC suites, and the Livermore Kernels.
[1] “IA-64 Application Developer's Architecture Guide, Rev 1.0,” Intel Document 245188, http://developer.intel.com/designia64/, 1986.
[2] G. Beck, D. Yen, and T. Anderson, “The Cydra-5 Minisupercomputer: Architecture and Implementation,” J. Supercomputing, vol. 7, no. 1, pp. 143-180, May 1993.
[3] A. Charlesworth, “An Approach to Scientific Array Processing: The Architectural Design of the ap-120b/fps-164 Family,” Computer, vol. 14, no. 9, pp. 18-27, Sept. 1981.
[4] R.G. Cytron, “Compiler-Time Scheduling and Optimization for Asynchronous Machines,” Dept. of Computer Science Report UIUCDCS-R-84-1177, Univ. of Illinois at Urbana-Champaign, 1984.
[5] A.E. Eichenberger, E. Davidson, and S.G. Abraham, “Minimum Register Requirements for a Modulo Schedule,” Proc. 27th Int'l Symp. Microarchitecture, pp. 75-84, Nov. 1994.
[6] A.E. Eichenberger and E.S. Davidson, “Stage Scheduling: A Technique to Reduce the Register Requirements of a Modulo Schedule,” Proc. 28th Ann. Int'l Symp. Microarchitecture, pp. 338-349, Nov. 1995.
[7] M. Fernandes, J. Llosa, and N. Topham, “Partitioned Schedules for Clustered VLIW Architectures,” Proc. 12th Int'l Parallel Processing Symp., pp. 386-391, Mar. 1998.
[8] M.A. Fernandes, “Clustered VLIW Architecture Based on Queue Register File,” Dept. of Computer Science, Univ. of Edinburgh.
[9] R. Govindarajan, E.R. Altman, and G.R. Gao, “Minimizing Register Requirements under Resource-Constrained Rate-Optimal Software Pipelining,” Proc. 27th Int'l Symp. Microarchitecture, pp. 85-94, Nov. 1994.
[10] P.Y.-T. Hsu, “Highly Concurrent Scalar Processing,” PhD Dissertation, Dept. of Computer Science, Univ. of Illinois at Urbana-Champaign, 1986.
[11] R.A. Huff, “Lifetime-Sensitive Modulo Scheduling,” Proc. ACM SIGPLAN '93 Conf. Programming Language Design and Implementation, pp. 258-267, June 1993.
[12] V. Kathail, M. Schlansker, and B.R. Rau, “HPL PlayDoh Architecture Specifications: Version 1.0,” HP Laboratories Technical Report HPL-93-80, 1994.
[13] R. Keller, “Lookahead Processors,” ACM Computing Surveys, pp. 177-195, Dec. 1975.
[14] K. Kiyohara, S.A. Mahlke, W.Y. Chen, R.A. Bringmann, R.E. Hank, S. Anik, and W.W. Hwu, “Register Connection: A New Approach to Adding Registers into Instruction Set Architectures,” Proc. 20th Ann. Int'l Symp. Computer Architecture, pp. 247-256, May 1993.
[15] M. Lam, “Software Pipelining: An Effective Scheduling Technique for VLIW Machines,” Proc. ACM SIGPLAN '88 Conf. Programming Language Design and Implementation, pp. 318-327, 1988.
[16] J. Llosa, M. Valero, J. Fortes, and E. Ayguade, “Using Sacks to Organize Registers in VLIW Machines,” Proc. Int'l Conf. Parallel and Vector Processing, Sept. 1994.
[17] W. Mangione-Smith, S.G. Abraham, and E.S. Davidson, “Register Requirements of Pipelined Processors,” Proc. Int'l Conf. Supercomputing, pp. 260-271, July 1992.
[18] B.R. Rau, “Iterative Modulo Scheduling: An Algorithm for Software Pipelining Loops: An Algorithm for Software Pipelining Loops,” Proc. 27th Ann. Int'l Symp. Microarchitecture, pp. 63-74, Nov. 1994.
[19] B.R. Rau, C.D. Glaeser, and R.L. Picard, “Efficient Code Generation for Horizontal Architectures: Computer Techniques and Architectural Support,” Proc. Ninth Ann. Int'l Symp. Computer Architecture, pp. 131-139, 1982.
[20] B.R. Rau, P.J. Kuekes, and C.D. Glaeser, A Statically Scheduled VLSI Interconnect for Parallel Processors. Computer Science Press, 1981.
[21] B.R. Rau, M. Lee, P.P. Tirumalai, and M.S. Schlansker, “Register Allocation for Software Pipelined Loops,” Proc. ACM SIGPLAN '92 Conf. Programming Language Design and Implementation, pp. 283-299, June 1992.
[22] B.R. Rau, D.W.L. Yen, and R.A. Towle, “The Cydra 5 Departmental Supercomputer,” Computer, vol. 22, no. 1, pp. 12-34, Jan. 1989.
[23] J.E. Smith, “Decoupled Access/Execute Computer Architectures,” Proc. Ninth Ann. Int'l Symp. Computer Architecture, pp. 112-119, June 1982.
[24] G.S. Tyson, “Evaluation of a Scalable Decoupled Microprocessor Design,” PhD dissertation, Univ. of California, Davis, 1997.
[25] W.A. Wulf, “Evaluation of the WM Architecture,” Proc. 19th Ann. Int'l Symp. Computer Architecture, pp. 382-390, May 1992.

