This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Early Register Deallocation Mechanisms Using Checkpointed Register Files
September 2006 (vol. 55 no. 9)
pp. 1153-1166
Modern superscalar microprocessors need sizable register files to support a large number of in-flight instructions for exploiting instruction level parallelism (ILP). An alternative to building large register files is to use a smaller number of registers, but manage them more effectively. More efficient management of registers can also result in higher performance if the reduction of the register file size is not the goal. Traditional register file management mechanisms deallocate a physical register only when the next instruction writing to the same destination architectural register commits. In this paper, we propose several techniques for deallocating physical registers much earlier. Our designs rely on the use of a checkpointed register file (CRF), where a local shadow copy of each bitcell is used to temporarily save the values of the early deallocated registers should they be needed to recover from branch mispredictions or to reconstruct the precise state after exceptions or interrupts. The proposed techniques try to release registers as soon as possible and are more aggressive than the previously proposed schemes for early deallocation of registers.

[1] H. Akkary, R. Rajwar, and S. Srinivasan, “Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors,” Proc. Int'l Symp. Microarchitecture (MICRO-36), 2003.
[2] S. Balakrishnan and G. Sohi, “Exploiting Value Locality in Physical Register Files,” Proc. Intl Symp. Microarchitecture (MICRO-36), 2003.
[3] R. Balasubramonian, S. Dwarkadas, and D. Albonesi, “Reducing the Complexity of the Register File in Dynamic Superscalar Processor,” Proc. Int'l Symp. Microarchitecture (MICRO-34), 2001.
[4] E. Borch, E. Tune, S. Manne, and J. Emer, “Loose Loops Sink Chips,” Proc. Int'l Conf. High Performance Computer Architecture (HPCA-8), 2002.
[5] D. Burger and T.M. Austin, “The SimpleScalar Tool Sset: Version2.0,” technical report, Dept. of Computer Science, Univ. of Wisconsin-Madison and documentation for all SimpleScalar releases (through version 3.0), June 1997.
[6] A. Butts and G. Sohi, “Characterizing and Predicting Value Degree of Use,” Proc. Int'l Symp. Microarchitecture (MICRO-35), 2002.
[7] A. Butts and G. Sohi, “Dynamic Dead-Instruction Detection and Elimination,” Proc. Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS-X), 2003.
[8] A. Butts and G. Sohi, “Use-Based Register Caching with Decoupled Indexing,” Proc. Int'l Symp. Computer Architecture (ISCA-31), 2004.
[9] R. Canal, J.M. Parcerisa, and A. Gonzalez, “Dynamic Cluster Assignment Mechanisms,” Proc. Int'l Conf. High Performance Computer Architecture (HPCA-6), 2000.
[10] J. Cruz, A. Gonzalez, M. Valero, and P. Topham, “Multiple-Banked Register File Architecture,” Proc. Int'l Symp. Computer Architecture (ISCA-27), 2000.
[11] O. Ergin, D. Balkan, K. Ghose, and D. Ponomarev, “Register Packing: Exploiting Narrow-Width Operands for Reducing Register File Pressure,” Proc. Int'l Symp. Microarchitecture (MICRO-37), 2004.
[12] O. Ergin, D. Balkan, D. Ponomarev, and K. Ghose, “Increasing Processor Performance through Early Register Release,” Proc. Int'l Conf. Computer Design (ICCD), 2004.
[13] K. Farkas, P. Chow, N. Jouppi, and Z. Vranesic, “The Multicluster Architecture: Reducing Cycle Time through Partitioning,” Proc. Int'l Symp. Microarchitecture (MICRO-30), 1997.
[14] M. Franklin and G. Sohi, “Register Traffic Analysis for Streamlining Inter-Operation Communication in Fine-Grain Parallel Processors,” Proc. Int'l Symp. Microarchitecture (MICRO-25), 1992.
[15] A. Gonzalez, J. Gonzalez, and M. Valero, “Virtual-Physical Registers,” Proc. Int'l Conf. High Performance Computer Architecture (HPCA-4), 1998.
[16] G. Hinton, D. Sager, M. Upton, D. Boggs, D. Carmean, A. Kyker, and P. Roussel, “The Microarchitecture of the Pentium 4 Processor,” Intel Technology J., Q1 2001.
[17] W.M. Hwu and Y.N. Patt, “Checkpoint Repair for Out-of-Order Execution Machines,” Proc. Int'l Symp. Computer Architecture (ISCA-14), 1987.
[18] S. Jourdan, R. Ronen, M. Bekerman, B. Shomar, and A. Yoaz, “A Novel Renaming Scheme to Exploit Value Temporal Locality through Physical Register Reuse and Unification,” Proc. Int'l Symp. Microarchitecture (MICRO-31), 1998.
[19] R.E. Kessler, “The Alpha 21264 Microprocessor,” IEEE Micro, vol. 19, no. 2, pp. 24-36, Mar./Apr. 1999
[20] M. Lipasti, B. Lestan, and E. Gunadi, “Physical Register Inlining,” Proc. Int'l Symp. Computer Architecture (ISCA-31), 2004.
[21] G. Lozano and G. Gao, “Exploiting Short-Lived Variables in Superscalar Processors,” Proc. Intl Symp. Microarchitecture (MICRO-28), 1995.
[22] J. Martinez, J. Renau, M. Huang, M. Prvulovich, and J. Torrellas, “Cherry: Checkpointed Early Resource Recycling in Out-of-Order Microprocessors,” Proc. Int'l Symp. Microarchitecture (MICRO-35), 2002.
[23] T. Monreal, V. Vinals, A. Gonzalez, M. Valero, and J. Gonzalez, “Delaying Register Allocation through Virtual-Physical Registers,” Proc. Int'l Symp. Microarchitecture (MICRO-32), 1999.
[24] T. Monreal, V. Vinals, A. Gonzalez, and M. Valero, “Hardware Schemes for Early Register Release,” Proc. Int'l Conf. Parallel Processing (ICPP-02), 2002.
[25] M. Moudgill, K. Pingali, and S. Vassiliadis, “Register Renaming and Dynamic Speculation: An Alternative Approach,” Proc. Int'l Symp. Microarchitecture (MICRO-26) 1993.
[26] D. Ponomarev, G. Kucuk, O. Ergin, and K. Ghose, “Reducing Datapath Energy through the Isolation of Short-Lived Operands,” Proc. Int'l Conf. Parallel Architectures and Compilation Techniques (PACT-12), 2003.
[27] E. Savransky, R. Ronen, and A. Gonzalez, “Lazy Retirement: A Power Aware Register Management Mechanism,” Proc. Workshop Complexity-Effective Design (WCED) Int'l Symp. Computer Architecture (ISCA-29), 2002.
[28] N. Tran, N. Nelson, F. Ngai, S. Dropsho, and M. Huang, “Dynamically Reducing Pressure on the Physical Register File through Simple Register Sharing,” Proc. Int'l Symp. Performance Analysis of Systems and Software (ISPASS), 2004.
[29] S. Wallace and N. Bagherzadeh, “A Scalable Register File Architecture for Dynamically Scheduled Processors,” Proc. Int'l Conf. Parallel Architectures and Compilation Techniques (PACT-5), 1996.
[30] K. Yeager, “The MIPS R10000 Superscalar Microprocessor,” IEEE Micro, vol. 16, no. 2, Apr. 1996.

Index Terms:
Superscalar processors, register file optimization, precise interrupts.
Citation:
Oguz Ergin, Deniz Balkan, Dmitry Ponomarev, Kanad Ghose, "Early Register Deallocation Mechanisms Using Checkpointed Register Files," IEEE Transactions on Computers, vol. 55, no. 9, pp. 1153-1166, Sept. 2006, doi:10.1109/TC.2006.145
Usage of this product signifies your acceptance of the Terms of Use.