This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
State-Retentive Power Gating of Register Files in Multicore Processors Featuring Multithreaded In-Order Cores
November 2011 (vol. 60 no. 11)
pp. 1547-1560
Soumyaroop Roy, University of South Florida
Nagarajan Ranganathan, University of South Florida
Srinivas Katkoori, University of South Florida University of South Florida, Tampa
In this work, we investigate state-retentive power gating of register files for leakage reduction in multicore processors supporting multithreading. In an in-order core, when a thread gets blocked due to a memory stall, the corresponding register file can be placed in a low leakage state through power gating for leakage reduction. When the memory stall gets resolved, the register file is activated for being accessed again. Since the contents of the register file are not lost and restored on wakeup, this is referred to as state-retentive power gating of register files. While state-retentive power gating in single cores has been studied in the literature, it is being investigated for multicore architectures for the first time in this work. We propose specific techniques to implement state-retentive power gating for three different multicore processor configurations based on the multithreading model: 1) coarse-grained multithreading, 2) fine-grained multithreading, and 3) simultaneous multithreading. The proposed techniques can be implemented as design extensions within the control units of the in-order cores. Each technique uses two different modes of leakage states: low-leakage savings and low wake-up and high-leakage savings and high wake-up latency. The overhead due to wake-up latency is completely avoided in two techniques while it is hidden for most part in the third approach, either by overlapping the wake-up process with the thread context switching latency or by executing instructions from other threads ready for execution. The proposed techniques were evaluated through simulations with multiprogrammed workloads comprised of SPEC 2000 integer benchmarks. Experimental results show that in an 8-core processor executing 64 threads, the average leakage savings were 42 percent in coarse-grained multithreading, while they were between seven percent and eight percent for finegrained and simultaneous multithreading.

[1] S. Borkar, “Design Challenges of Technology Scaling,” IEEE Micro, vol. 19, no. 4, pp. 23-29, July/Aug. 1999.
[2] L. Seiler et al., “Larrabee: A Many-Core X86 Architecture for Visual Computing,” ACM Trans. Graphics, vol. 27, no. 3, pp. 1-15, 2008.
[3] R.K. Krishnamurthy et al., “High-Performance and Low-Voltage Challenges for Sub-45nm Microprocessor Circuits,” Proc. Int'l Conf. Application Specific Integrated Circuit (ASIC), pp. 283-286, 2005.
[4] A. Sodan et al., “Parallelism via Multithreaded and Multicore CPUs,” IEEE Computer, vol. 43, no. 3, pp. 24-32, Mar. 2010.
[5] P. Kongetira, K. Aingaran, and K. Olukotun, “Niagara: A 32-Way Multithreaded Sparc Processor,” IEEE Micro, vol. 25, no. 2, pp. 21-29, Mar./Apr. 2005.
[6] P. Kongetira, K. Aingaran, and K. Olukotun, “Implementation of an 8-Core, 64-Thread, Power-Efficient SPARC Server on a Chip,” IEEE J. Solid-State Circuit, vol. 43, no. 1, pp. 6-20, 2008.
[7] MIPS. MIPS32 1004KTM CPU Family Software Users Manual, http:/www.mips.com, 2009.
[8] T. Ungerer, B. Robič, and J. Šilc, “A Survey of Processors with Explicit Multithreading,” ACM Computing Surveys, vol. 35, no. 1, pp. 29-63, 2003.
[9] J.P. Shen and M.H. Lipasti, Modern Processor Design: Fundamentals of Superscalar Processors, first ed. McGraw-Hill, 2004.
[10] K. Roy, “Leakage Power Reduction in Low-Voltage CMOS Design,” Proc. IEEE Int'l Conf. Electronics, Circuits and Systems (ICECS), pp. 167-173, 1998.
[11] Z. Hu et al., “Microarchitectural Techniques for Power Gating of Execution Units,” Proc. Int'l Symp. Low Power Electronics and Design (ISLPED), pp. 32-37, 2004.
[12] H. Singh et al., “Enhanced Leakage Reduction Techniques Using Intermediate Strength Power Gating,” IEEE Trans. Very Large Scale Integration Systems, vol. 15, no. 11, pp. 1215-1224, Nov. 2007.
[13] S. Kaxiras, Z. Hu, and M. Martonosi, “Cache Decay: Exploiting Generational Behavior to Reduce Cache Leakage Power,” Proc. ACM/IEEE Int'l Symp. Computer Architecture (ISCA), pp. 240-251, 2001.
[14] K. Flautner et al., “Drowsy Caches: Simple Techniques for Reducing Leakage Power,” Proc. ACM/IEEE Int'l Symp. Computer Architecture (ISCA), pp. 148-157, 2002.
[15] S. Rele et al., “Optimizing Static Power Dissipation by Functional Units Superscalar processors,” Proc. Int'l Conf. Compiler Construction, pp. 261-274, 2002.
[16] W. Zhang et al., “Compiler Support for Reducing Leakage Energy Consumption,” Proc. Conf. Design, Automation and Test in Europe (DATE), pp. 1146-1147, 2003.
[17] Y. You, C. Lee, and J.K. Lee, “Compiler Analysis and Supports for Leakage Power Reduction on Microprocessors,” ACM Trans. Design Automation of Electronic Systems, vol. 11, pp. 147-164, 2006.
[18] N. Seki et al., “A Fine-Grain Dynamic Sleep Control Scheme in MIPS R3000,” Proc. IEEE Int'l Conf. Computer Design (ICCD), pp. 612-617, 2008.
[19] S. Roy, N. Ranganathan, and S. Katkoori, “A Framework for Power Gating Functional Units in Embedded Microprocessors,” IEEE Trans. Very Large Scale Integration Systems, vol. 17, no. 11, pp. 1640-1649, Nov. 2009.
[20] H. Homayoun, K.F. Li, and S. Rafatirad, “Functional Units Power Gating in SMT Processors,” Proc. IEEE Pacific Rim Conf. (PACRIM), pp. 125-128, 2005.
[21] A. Youssef et al., “On the Power Management of Simultaneous Multithreading Processors,” IEEE Trans. Very Large Scale Integration Systems, vol. PP(99), p. 1, 2009.
[22] S. Rusu et al., “Power Reduction Techniques for an 8-Core Xeon Processor,” Proc. IEEE European Solid-State Circuits Conf. (ESSCIRC), pp. 340-343, 2009.
[23] R. Kumar and G. Hinton, “A Family of 45nm IA Processors,” Proc. IEEE Int'l Solid State Circuit Conf. (SCC), pp. 58-59, 2009.
[24] T. Saito et al., “Design of Superscalar Processor with Multi-Bank Register File,” Proc. IEEE Int'l Symp. Circuits and Systems (ISCAS), pp. 3507-3510, 2005.
[25] A. Agarwal, R. Kaushik, and R.K. Krishnamurthy, “A Leakage-Tolerant Low-Leakage Register File with Conditional Sleep Transistor,” Proc. IEEE Int'l System on Chip Conf. (SOC), pp. 241-244, 2004.
[26] J. Lingling et al., “Reduce Register Files Leakage through Discharging Cells,” Proc. IEEE Int'l Conf. Computer Design (ICCD), pp. 114-119, 2006.
[27] H.O. Kim et al., “Supply Switching with Ground Collapse for Low-Leakage Register Files in 65-nm CMOS,” IEEE Trans. Very Large Scale Integration Systems., vol. 18, no. 3, pp. 505-509, Mar. 2010.
[28] A.S. Leon et al., “A Power-Efficient High-Throughput 32-Thread Sparc Processor,” Proc. IEEE J. Solid State Circuit Conf. (JSSC), vol. 42, no. 1, pp. 7-16, 2007.
[29] Sun Microsystems, “OpenSPARC T1 Processor Megacell Specification,” http:/www.sun.com, 2006.
[30] S. Li et al., “Mcpat: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures,” Proc. Symp. IEEE Micro, pp. 469-480, 2009.
[31] H.Q. Le et al., “Ibm Power6 Microarchitecture,” IBM J. Research and Development, vol. 51, no. 6, pp. 639-662, 2007.
[32] Sun Microsystems, “OpenSPARC T1 Microarchitecture Specification,” http:/www.sun.com, 2006.
[33] J.E. Stine et al., “FreePDK v2.0: Transitioning VLSI Education towards Nanometer Variation-Aware Designs,” Proc. IEEE Int'l Microelectronic Systems Education, pp. 100-103, 2009.
[34] Nangate. Nangate 45nm Open Cell Library, www.nangate.comopenlibrary, 2008.
[35] N.L. Binkert et al., “The M5 Simulator: Modeling Networked Systems,” IEEE Micro, vol. 26, no. 4, pp. 52-60, July/Aug. 2006.
[36] A.J. KleinOsowski and D.J. Lilja, “MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research,” IEEE Computer Architecture Letters, vol. 1, no. 1, p. 7, Jan.-Dec. 2002.
[37] M. Jahre and L. Natvig, “Performance Effects of a Cache Miss Handling Architecture in a Multi-Core Processor,” Norwegian Informatikkonferanse (Nik), http:/www.nik.no, 2007.
[38] J. Burns and J.L. Gaudiot, “SMT Layout Overhead and Scalability,” IEEE Trans. Parallel and Distributed Systems, vol. 13, no. 2, pp. 142-155, Feb. 2002.

Index Terms:
CGMT, FGMT, SMT, Niagara, M5, in-order.
Citation:
Soumyaroop Roy, Nagarajan Ranganathan, Srinivas Katkoori, "State-Retentive Power Gating of Register Files in Multicore Processors Featuring Multithreaded In-Order Cores," IEEE Transactions on Computers, vol. 60, no. 11, pp. 1547-1560, Nov. 2011, doi:10.1109/TC.2010.249
Usage of this product signifies your acceptance of the Terms of Use.