This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Predicting and Exploiting Transient Values for Reducing Register File Pressure and Energy Consumption
January 2008 (vol. 57 no. 1)
pp. 82-95
High-performance microprocessors use large, heavily-ported physical register files (RFs) to increase the instruction throughput. The high complexity and power dissipation of such RFs mainly stem from the need to maintain each and every result for a large number of cycles after the result generation. We observed that a significant fraction (about 45%) of the result values are never read from the register file and are not required to recover from branch mispredictions. In this paper, we propose SPARTAN - a set of micro-architectural extensions that predicts such transient values and in many cases completely avoids physical register allocations to them. We show that the transient values can be predicted as such with more than 97% accuracy on the average across simulated SPEC 2000 benchmarks. We evaluate the performance of SPARTAN on a variety of configurations and show that significant improvements in performance and energy-efficiency can be realized. Furthermore, we directly compare SPARTAN against a number of previously proposed schemes for register optimizations and show that our technique significantly outperforms all those schemes.

[1] A. Azevedo, I. Issenin, R. Cornea, R. Gupta, N. Dutt, A. Veidenbaum, and A. Nicolau, “Profile-Based Dynamic Voltage Scheduling Using Program Checkpoints,” Proc. Design, Automation and Test in Europe (DATE '02), 2002.
[2] S. Balakrishnan and G.S. Sohi, “Exploiting Value Locality in Physical Register Files,” Proc. 36th Ann. Int'l Symp. Microarchitecture (MICRO '03), 2003.
[3] R. Balasubramonian, S. Dwarkadas, and D. Albonesi, “Reducing the Complexity of the Register File in Dynamic Superscalar Processor,” Proc. 34th Int'l Symp. Microarchitecture (MICRO '03), 2001.
[4] E. Borch, E. Tune, S. Manne, and J. Emer, “Loose Loops Sink Chips,” Proc. Eighth Int'l Symp. High-Performance Computer Architecture (HPCA '02), 2002.
[5] A. Bracy, P. Prahlad, and A. Roth, “Exploiting Data-Flow Mini-Graphs in Superscalar Processors,” Proc. 37th Ann. Int'l Symp. Microarchitecture (MICRO '03), 2004.
[6] D. Burger and T.M. Austin, “The SimpleScalar Tool Set: Version 2.0,” technical report, Dept. of Computer Science, Univ. of Wisconsin-Madison, 1997.
[7] J. Butts and G.S. Sohi, “Use-Based Register Caching with Decoupled Indexing,” Proc. 31st Ann. Int'l Symp. Computer Architecture (ISCA '04), 2004.
[8] G. Chrysos and J. Emer, “Memory Dependence Prediction Using Store Sets,” Proc. 25th Ann. Int'l Symp. Computer Architecture (ISCA '98), 1998.
[9] J-L. Cruz, A. Gonzalez, and M. Valero, “Multiple-Banked Register File Architecture,” Proc. 27th Ann. Int'l Symp. Computer Architecture (ISCA '00), 2000.
[10] O. Ergin, D. Balkan, D. Ponomarev, and K. Ghose, “Increasing Processor Performance through Early Register Release,” Proc. 22nd IEEE Int'l Conf. Computer Design (ICCD '04), 2004.
[11] O. Ergin, D. Balkan, K. Ghose, and D. Ponomarev, “Register Packing: Exploiting Narrow-Width Operands for Reducing Register File Pressure,” Proc. 37th Ann. Int'l Symp. Microarchitecture (MICRO '04), 2004.
[12] M. Franklin and G.S. Sohi, “Register Traffic Analysis for Streamlining Inter-Operation Communication in Fine-Grain Parallel Processors,” Proc. 25th Ann. Int'l Symp. Microarchitecture (MICRO '92), 1992.
[13] A. Gonzalez, J. Gonzalez, and M. Valero, “Virtual-Physical Registers,” Proc. Fourth Int'l Symp. High-Performance Computer Architecture (HPCA '98), 1998.
[14] R. Gonzalez, A. Cristal, D. Ortega, A. Veidenbaum, and M. Valero, “A Content Aware Register File Organization,” Proc. 31st Ann. Int'l Symp. Computer Architecture (ISCA '04), 2004.
[15] S. Gopal, T.N. Vijaykumar, J. Smith, and G.S. Sohi, “Speculative Versioning Cache,” Proc. Fourth Int'l Symp. High-Performance Computer Architecture (HPCA '98), 1998.
[16] G. Hinton, D. Sager, M. Upton, D. Boggs, D. Carmean, A. Kyker, and P. Roussel, “The Microarchitecture of the Pentium 4 Processor,” Intel Technology J., Q1, 2001.
[17] S. Jourdan, R. Ronen, M. Bekerman, B. Shomar, and A. Yoaz, “A Novel Renaming Scheme to Exploit Value Temporal Locality through Physical Register Reuse and Unification,” Proc. 31st Ann. Int'l Symp. Microarchitecture (MICRO '98), 1998.
[18] R.E. Kessler, “The Alpha 21264 Microprocessor,” IEEE Micro, vol. 19, no. 2, pp. 24-36, Mar./Apr. 1999.
[19] N. Kim and T. Mudge, “Reducing Register Ports Using Delayed Write-Back Queues and Operand Pre-Fetch,” Proc. 17th Ann. Int'l Conf. Supercomputing (ICS '03), 2003.
[20] M. Kondo and H. Nakamura, “A Small, Fast and Low-Power Register File by Bit-Partitioning,” Proc. 11th Int'l Symp. High-Performance Computer Architecture (HPCA '05), 2005.
[21] M.H. Lipasti, B.R. Mestan, and E. Gunadi, “Physical Register Inlining,” Proc. 31st Ann. Int'l Symp. Computer Architecture (ISCA '04), 2004.
[22] G. Lozano and G. Gao, “Exploiting Short-Lived Variables in Superscalar Processors,” Proc. 28th Ann. Int'l Symp. Microarchitecture (MICRO '95), 1995.
[23] M. Martin, A. Roth, and C. Fischer, “Exploiting Dead Value Information,” Proc. 30th Ann. Int'l Symp. Microarchitecture (MICRO '97), 1997.
[24] J.F. Martinez, J. Renau, M.C. Huang, M. Prvulovic, and J. Torrellas, “Cherry: Checkpointed Early Resource Recycling in Out-of-Order Microprocessors,” Proc. 35th Ann. Int'l Symp. Microarchitecture (MICRO '02), 2002.
[25] T. Monreal, A. Gonzalez, M. Valero, J. Gonzalzez, and V. Vinals, “Delaying Register Allocation through Virtual-Physical Registers,” Proc. 32nd Ann. Int'l Symp. Microarchitecture (MICRO '99), 1999.
[26] T. Monreal, V. Vinals, J. Gonzalez, A. Gonzalez, and M. Valero, “Late Allocation and Early Release of Physical Registers,” IEEE Trans. Computers, vol. 53, no. 10, Oct. 2004.
[27] T. Monreal, V. Vinals, A. Gonzalez, and M. Valero, “Hardware Schemes for Early Register Release,” Proc. 31st Int'l Conf. Parallel Processing (ICPP '02), 2002.
[28] M. Moudgill, K. Pingali, and S. Vassiliadis, “Register Renaming and Dynamic Speculation: An Alternative Approach,” Proc. 26th Ann. Int'l Symp. Microarchitecture (MICRO '93), 1993.
[29] L. Park, M. Powell, and T.N. Vijaykumar, “Reducing Register Ports for Higher Speed and Lower Energy,” Proc. 35th Ann. Int'l Symp. Microarchitecture (MICRO '02), 2002.
[30] V. Petric, T. Sha, and A. Roth, “RENO: A Rename-Based Instruction Optimizer,” Proc. 32nd Ann. Int'l Symp. Computer Architecture (ISCA '05), 2005.
[31] D. Ponomarev, G. Kucuk, O. Ergin, and K. Ghose, “Reducing Datapath Energy through the Isolation of Short-Lived Operands,” Proc. 12th Ann. Int'l Conf. Parallel Architectures and Compilation Techniques (PACT '03), 2003.
[32] A. Roth and G.S. Sohi, “Register Integration: A Simple and Efficient Implementation of Squash Reuse,” Proc. 33rd Ann. Int'l Symp. Microarchitecture (MICRO '00), 2000.
[33] P.G. Sassone and D.S. Wills, “Dynamic Strands: Collapsing Speculative Dependence Chains for Reducing Pipeline Communication,” Proc. 37th Ann. Int'l Symp. Microarchitecture (MICRO '04), 2004.
[34] S.T. Srinivasan, R. Rajwar, H. Akkary, A. Gandhi, and M. Upton, “Continual Flow Pipelines,” Proc. 11th Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS '04), 2004.
[35] L. Tran, N. Nelson, F. Ngai, S. Dropsho, and M. Huang, “Dynamically Reducing Pressure on the Physical Register File through Simple Register Sharing,” Proc. IEEE Int'l Symp. Performance Analysis of Systems and Software (ISPASS '04), 2004.
[36] J. Tseng and K. Asanovic, “Banked Multiported Register Files for High Frequency Superscalar Microprocessors,” Proc. 29th Ann. Int'l Symp. Computer Architecture (ISCA '02), 2002.
[37] S. Wallace and N. Bagherzadeh, “A Scalable Register File Architecture for Dynamically Scheduled Processors,” Proc. Int'l Conf. Parallel Architectures and Compilation Techniques (PACT '96), 1996.
[38] K. Yeager, “The MIPS R10000 Superscalar Microprocessor,” IEEE Micro, vol. 16, no. 2, Apr. 1996.
[39] Y. Li, D. Brooks, Z. Hu, and K. Skadron, “Performance, Energy, and Thermal Considerations for SMT and CMP Architectures,” Proc. 11th Int'l Symp. High-Performance Computer Architecture (HPCA '05), 2005.
[40] M. Kírman, N. Kírman, and J.F. Martinez, “Cherry-MP: Correctly Integrating Checkpointed Early Resource Cycling in Chip Multiprocessors,” Proc. 38th Ann. Int'l Symp. Microarchitecture (MICRO '05), 2005.
[41] T.M. Jones, M.F.P. O'Boyle, J. Abella, A. Gonzalez, and O. Ergin, “Compiler Directed Early Register Release,” Proc. 14th Ann. Int'l Conf. Parallel Architectures and Compilation Techniques (PACT '05), 2005.
[42] M. Yourst, “PTLsim User's Guide and Reference,” http:/www.ptlsim.org, 2007.

Index Terms:
General, Pipeline processors, Microprocessors, Performance attributes
Citation:
Deniz Balkan, Joseph Sharkey, Dmitry V. Ponomarev, Kanad Ghose, "Predicting and Exploiting Transient Values for Reducing Register File Pressure and Energy Consumption," IEEE Transactions on Computers, vol. 57, no. 1, pp. 82-95, Jan. 2008, doi:10.1109/TC.2007.70785
Usage of this product signifies your acceptance of the Terms of Use.