This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Late Allocation and Early Release of Physical Registers
October 2004 (vol. 53 no. 10)
pp. 1244-1259
Jos? Gonz?lez, IEEE Computer Society
Antonio Gonz?lez, IEEE Computer Society
The Register File is one of the critical components of current processors in terms of access time and power consumption. Among other things, the potential to exploit instruction-level parallelism is closely related to the size and number of ports of the register file. In conventional register renaming schemes, both register allocation and releasing are conservatively done, the former at the rename stage, before registers are loaded with values, and the latter at the commit stage of the instruction redefining the same register, once registers are not used anymore. In this paper, we introduce VP-LAER, a renaming scheme that allocates registers later and releases them earlier than conventional schemes. Specifically, physical registers are allocated at the end of the execution stage and released as soon as the processor realizes that there will be no further use of them. VP-LAER enhances register utilization, that is, the fraction of allocated registers having a value to be read in the future. Detailed cycle-level simulations show either a significant speedup for a given register file size or a reduction in the register file size for a given performance level, especially for floating-point codes, where the register file pressure is usually high.

[1] R. Balasubramonian, S. Dwarkadas, and D.H. Albonesi, Dynamically Allocating Processor Resources between Nearby and Distant ILP Proc. 28th Ann. Int'l Symp. Computer Architecture, pp. 26-37, 2001.
[2] R. Balasubramonian, S. Dwarkadas, and D. Albonesi, Reducing the Complexity of the Register File in Dynamic Superscalar Processor Proc. 34th Int'l Symp. Microarchitecture (MICRO-34), 2001.
[3] A. Baniasadi and A. Moshovos, Instruction Distribution Heuristics for Quad-Cluster, Dynamically-Scheduled, Superscalar Processors Proc. 33rd Ann. ACM/IEEE Int'l Symp. Microarchitecture (MICRO '00), pp. 337-347, Dec. 2000.
[4] E. Borch, E. Tune, S. Manne, and J. Emer, Loose Loops Sink Chips Proc. Int'l Conf. High Performance Computer Architecture (HPCA-02), 2002.
[5] E. Brekelbaum, J. Rupley, C. Wilkerson, and B. Black, Hierarchical Scheduling Windows Proc. 35th Ann. ACM/IEEE Int'l Symp. Microarchitecture (MICRO '02), pp. 27-36, Dec. 2002.
[6] D. Burger and T. Austin, The Simplescalar Tool Set v2.0 Technical Report TR-1342, Computer Science Dept., Univ. of Wisconsin-Madison, June 1997.
[7] R. Canal, J. Parcerisa, and A. González, Dynamic Cluster Assignment Mechanisms Proc. Sixth Int'l Symp. High-Performance Computer Architecture (HPCA '00), pp. 133-144, Jan. 2000.
[8] R. Canal and A. González, Reducing the Complexity of the Issue Logic Proc. 15th Int'l Conf. Supercomputing (ICS '01), pp. 312-319, June 2001.
[9] J.L. Cruz et al., Multiple-Banked Register File Architecture Proc. 27th Int'l Symp. Computer Architecture, pp. 316-325, 2000.
[10] K. Farkas et al., "The Multicluster Architecture: Reducing Cycle Time Through Partitioning," to appear in Proc. 30th Ann. IEEE/ACM Int'l Symp Microarchitecture, IEEE Computer Society, Press, Los Alamitos, Calif., 1997.
[11] K.I. Farkas, N.P. Jouppi, and P. Chow, “Register File Design Considerations in Dynamically Scheduled Processors,” Proc. Second Ann. Int'l Symp. High-Performance Computer Architecture, pp. 40-51, Jan. 1996.
[12] A. Gonzalez and M. Valero, “Virtual Physical Registers,” Proc. Fourth Int'l Symp. High Performance Computer Architecture (HPCA-4), pp. 175-184, Feb. 1998.
[13] A. González, M. Valero, J. González, and T. Monreal, Virtual Registers Proc. Third Int'l Conf. High Performance Computing (HiPC '97), pp. 364-369, Dec. 1997.
[14] L. Gwennap, Intel's P6 Uses Decoupled Superscalar Design Microprocessor Report, vol. 9, no. 4, pp. 9-15, Feb. 1995.
[15] L. Gwennap, Mips r12000 to Hit 300 Mhz Microprocessor Report, Micro Design Resources, vol. 11, no. 13, pp. 1-4, Oct. 1997.
[16] G. Hinton, D. Sager, M. Upton, D. Boggs, D. Carmean, A. Kyker, and P. Roussel, The Microarchitecture of the Pentium 4 Processor Intel Technology J. Q1, Feb. 2001.
[17] W.W. Hwu and Y.N. Patt, Checkpoint Repair for Out-of-Order Execution Machines Proc. 14th Ann. Int'l Symp. Computer Architecture (ISCA '87), pp. 18-26, June 1987.
[18] R.M. Keller, Look-Ahead Processors ACM Computing Surveys, vol. 7, no. 4, pp. 177-195, Dec. 1975.
[19] R.E. Kessler, “The Alpha 21264 Microprocessor,” IEEE Micro, vol. 19, no. 2, pp. 24–36, Mar./Apr. 1999.
[20] A.R. Lebeck, J. Koppanalil, T. Li, J. Patwardhan, and E. Rotenberg, A Large, Fast Instruction Window for Tolerating Cache Misses Proc. 29th Ann. Int'l Symp. Computer Architecture (ISCA '02), pp. 59-70, May 2002.
[21] D. Levitan, T. Thomas,, and P. Tu, ``The PowerPC 620 Microprocessor: A High Performance Superscalar RISC Microprocessor,'' Proc. CompCon '95, pp. 285-291, Mar. 1995.
[22] J. Lo, S. Parekh, S. Eggers, H. Levy, and D. Tullsen, “Software-Directed Register Deallocation for Simultaneous Multithreaded Processors,” IEEE Trans. Parallel and Distributed Systems, 1999.
[23] M. Martin, A. Roth, and C. Fischer, “Exploiting Dead Value Information,” Proc. 30th Int'l Symp. Microarchitecture, pp. 125-135, Dec. 1997.
[24] T. Monreal et al., "Delaying Physical Register Allocation Through Virtual-Physical Registers," Proc. MICRO-32, IEEE CS Press, 1999, pp. 186-192.
[25] T. Monreal, V. Viñals, A. González, and M. Valero, Hardware Schemes for Early Register Release Proc. Int'l Conf. Parallel Processing (ICPP '02), pp. 5-13, Aug. 2002.
[26] M. Moudgill, K. Pingali,, and S. Vassiliadis,"Register Renaming and Dynamic Speculation: an Alternative Approach," Proc. 26th Int'l Symp. Microarchitecture, ACM Press, 1993, pp. 202-213.
[27] K. Nowka and M. Flynn, Wave Pipelining of High Performance CMOS Static Ram Technical Report TR-94/615, Computer Systems Laboratory, Jan. 1994.
[28] S. Palacharla et al., "Complexity Effective Superscalar Processors," Proc. 24th Ann. Int'l Symp. Computer Architecture (ISCA 97), ACM Press, 1997, pp. 206-218.
[29] I. Park, M.D. Powell, and T.N. Vijaykumar, Reducing Register Ports for Higher Speed and Lower Energy Proc. 35th Ann. ACM/IEEE Int'l Symp. Microarchitecture (MICRO '02), pp. 171-182, Dec. 2002.
[30] Y.N. Patt et al., "One Billion Transistors, One Uniprocessor, One Chip," Computer, Sept. 1997, pp. 51-58.
[31] S. Rixner et al., "Register Organization for Media Processing," Proc. Sixth Int'l Symp. High-Performance Computer Architecture, IEEE CS Press, 2000, pp. 375-387.
[32] D. Sima, "The Design Space of Register Renaming Techniques," IEEE Micro, vol. 20, no. 5, Sept./Oct. 2000, pp. 70-83.
[33] J.E. Smith and A.R. Pleszkun, Implementation of Precise Interrupts in Pipelined Processors Proc. 12th Ann. Int'l Symp. Computer Architecture (ISCA '85), pp. 36-44, June 1985.
[34] D.M. Tullsen, S.J. Eggers, J.S. Emer, H.M. Levy, J.L. Lo, and R.L. Stamm, Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor Proc. 23rd Ann. Int'l Symp. Computer Architecture (ISCA '96), pp. 191-202, May 1996.
[35] D.M. Tullsen, S.J. Eggers, and H.M. Levy, Simultaneous Multithreading: Maximizing On-Chip Parallelism Proc. Int'l Symp. Computer Architecture, pp. 392-403, 1995.
[36] D.W. Wall, Limits of Instruction-Level Parallelism Proc. Fourth Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS '91), pp. 176-188, Apr. 1991.
[37] S. Wallace and N. Bagheryadeh, "A Scalable Register File Architecture for Dynamically Scheduled Processors," Proc. 1996 Conf. Parallel Architectures and Compilation Techniques, 1996, pp. 179-184.
[38] K.C. Yeager, “The MIPS R10000 Superscalar Microprocessor,” IEEE Micro, vol. 16, no. 2, pp. 28–40, Apr. 1996.
[39] J. Zalamea, J. Llosa, E. Ayguadé, and M. Valero, Two-Level Hierarchical Register File Organization for Vliw Processors Proc. 33rd Ann. ACM/IEEE Int'l Symp. Microarchitecture (MICRO '00), pp. 137-146, Dec. 2000.
[40] V. Zyuban and P. Kogge, “Inherently Lower-Power High-Performance Superscalar Architectures,” IEEE Trans. Computers, submitted.

Index Terms:
Register renaming, out-of-order processors, register file optimization, physical register allocation and releasing, precise exceptions.
Citation:
Teresa Monreal, V?ctor Vi?als, Jos? Gonz?lez, Antonio Gonz?lez, Mateo Valero, "Late Allocation and Early Release of Physical Registers," IEEE Transactions on Computers, vol. 53, no. 10, pp. 1244-1259, Oct. 2004, doi:10.1109/TC.2004.79
Usage of this product signifies your acceptance of the Terms of Use.