The Community for Technology Leaders
RSS Icon
Issue No.01 - January (2008 vol.57)
pp: 7-24
This paper describes a new on-demand wakeup prediction policy for instruction cache leakage control that achieves better leakage savings than prior policies, and avoids the performance overheads of prior policies. The proposed policy reduces leakage energy by more than 92% with only less than 0.3% performance overhead on average, whereas prior policies were either prone to severe performance overhead or failed to reduce the leakage energy as much. The key to this new on-demand policy is to use branch prediction information for the wakeup prediction. In the proposed policy, inserting an extra stage for wakeup between branch prediction and fetch, allows the branch predictor to be also used as a wakeup predictor without any additional hardware. Thus, the extra stage hides the wakeup penalty, not affecting branch prediction accuracy. Though extra pipeline stages typically add to branch misprediction penalty, in this case, the extra wakeup stage on the normal fetch path can be overlapped with misprediction recovery. With such consistently accurate wakeup prediction, all cache lines except the next expected cache line(s) are in the leakage saving mode, minimizing leakage energy. We focus on super-drowsy leakage control using reduced supply voltage, because it is well suited to the instruction cache"?s criticality. The proposed policy can be applied to other leakage saving circuit techniques as long as the wakeup penalty is at most one cycle.
Cache memories, Microprocessors, Low-power design, Energy-aware systems
Sung Woo Chung, "On-Demand Solution to Minimize I-Cache Leakage Energy with Maintaining Performance", IEEE Transactions on Computers, vol.57, no. 1, pp. 7-24, January 2008, doi:10.1109/TC.2007.70770
[1] A. Agarwal, L. Hai, and K. Roy, “A Single-Vt Low-Leakage Gated-Ground Cache for Deep Submicron,” IEEE J. Solid-State Circuits, vol. 38, pp. 319-328, Feb. 2003.
[2] T. Austin, E. Larson, and D. Ernst, “SimpleScalar: An Infrastructure for Computer System Modeling,” Computer, vol. 35, pp.59-67, 2002.
[3] S.W. Chung and K. Skadron, “Using Branch Prediction Information for Near-Optimal I-Cache Leakage,” Proc. 11th Asia-Pacific Computer Systems Architecture Conf., pp. 24-37, 2006.
[4] K. Flautner, N.S. Kim, S. Martin, D. Blaauw, and T. Mudge, “Drowsy Caches: Simple Techniques for Reducing Leakage Power,” Proc. 29th Int'l Symp. Computer Architecture, pp. 148-157, 2002.
[5] F. Hamzaoglu, Y. Ye, A. Keshavarzi, K. Zhang, S. Narendra, S. Borkar, M. Stan, and V. De, “Analysis of Dual-VT SRAM Cells with Full-Swing Single-Ended Bit Line Sensing for On-Chip Cache,” IEEE Trans. VLSI Systems, vol. 10, pp. 91-95, Apr. 2002.
[6] J.S. Hu, A. Nadgir, N. Vijaykrishnan, M.J. Irwin, and M. Kandemir, “Exploiting Program Hotspots and Code Sequentiality for Instruction Caches Leakage Management,” Proc. Int'l Symp. Low Power Electronics and Design, pp. 593-601, 2003.
[7] S. Kaxiras, Z. Hu, and M. Martonosi, “Cache Decay: Exploiting Generational Behavior to Reduce Cache Leakage Power,” Proc. 28th Int'l Symp. Computer Architecture, pp. 240-251, 2001.
[8] R. Kessler, “The Alpha 21264 Microprocessor,” IEEE Micro, pp. 24-36, 1999.
[9] N.S. Kim, K. Flautner, D. Blaauw, and T. Mudge, “Circuit and Microarchitectural Techniques for Reducing Cache Leakage Power,” IEEE Trans. VLSI Systems, vol. 12, no. 2, pp. 167-184, Feb. 2004.
[10] N.S. Kim, K. Flautner, D. Blaauw, and T. Mudge, “Single-VDD and Single-VT Super-Drowsy Techniques for Low-Leakage High-Performance Instruction Caches,” Proc. Int'l Symp. Low Power Electronics and Design, pp. 54-57, 2004.
[11] L. Li, V. Degalahal, N. Vojaykrishnan, M. Kandemir, and M.J. Irwin, “Soft Error and Energy Consumption Interactions: A Data Cache Perspective,” Proc. Int'l Symp. Low Power Electronics and Design, pp. 132-137, 2004.
[12] L. Li, I. Kadayif, Y.-F. Tsai, N. Vijaykrishnan, M. Kandemir, M.J. Irwin, and A. Sivasubramaniam, “Leakage Energy Management in Cache Hierarchies,” Proc. 11th Int'l Conf. Parallel Architectures and Compilation Techniques, pp. 131-140, 2002.
[13] Y. Li, D. Parikh, Y. Zhang, K. Sankaranarayanan, M. Stan, and K. Skadron, “State-Preserving vs. Non-State-Preserving Leakage Control in Caches,” Proc. Design, Automation and Test in Europe Conf. and Exhibition, pp. 22-27, 2004.
[14] Y. Li, M. Hempstead, P. Mauro, D. Brooks, Z. Hu, and K. Skadron, “Power and Thermal Effects of SRAM vs. LatchMux Design,” Proc. ACM/IEEE Int'l Symp. Low-Power Electronics Design, pp. 173-178, 2005.
[15] S. Manne, A. Klauser, and D. Grunwald, “Pipeline Gating: Speculation Control for Energy Reduction,” Proc. 25th Int'l Symp. Computer Architecture, pp. 132-141, 1998.
[16] S. McFaring, “Combining Branch Predictors,” Technical Note TN-36, Digital Equipment Corp., June 1993.
[17] Y. Meng, T. Sherwood, and R. Kastner, “On the Limits of Leakage Power Reduction in Caches,” Proc. 11th Int'l Symp. High-Performance Computer Architecture, 2005.
[18] M. Milenkovic, A. Milenkovic, and J. Kulick, “Demystifying Intel Branch Predictors,” Proc. Workshop Duplicating, Deconstructing and Debunking, 2002.
[19] K. Nii et al., “A Low Power SRAM Using Auto-Backgate-Controlled MT-CMOS,” Proc. Int'l Symp. Low Power Electronics and Design, pp. 293-298, 1998.
[20] M. Powell, S.-H. Yang, B. Falsafi, K. Roy, and N. Vijaykumar, “Gated-Vdd: A Circuit Technique to Reduce Leakage in Deep-Submicron Cache Memories,” Proc. Int'l Symp. Low Power Electronics and Design, pp. 90-95, 2000.
[21] G. Reinman and B. Calder, “Using a Serial Cache for Energy Efficient Instruction Fetching,” J. Systems Architecture, vol. 50, no. 11, pp. 675-685, 2004.
[22] E. Rotenberg, S. Nennett, and J.E. Smith, “A Trace Cache Microarchitecture and Evaluation,” IEEE Trans. Computers, vol. 48, no. 2, pp. 111-120, Feb. 1999.
[23] T. Sherwood, E. Perelman, G. Hamerly, and B. Calder, “Automatically Characterizing Large Scale Program Behavior,” Proc. 10th Int'l Conf. Architectural Support for Programming Languages and Operating Systems, 1997.
[24] S.H. Shin, S.W. Chung, and C.S. Jhon, “On the Reliability of Drowsy Instruction Caches,” Proc. 11th Asia-Pacific Computer Systems Architecture Conf., pp. 445-451, 2006.
[25] S. Yang and B. Falsafi, “Near-Optimal Precharging in High-Performance Nanoscale CMOS Caches,” Proc. 36th Ann. IEEE/ACM Int'l Symp. Microarchitecture, 2003.
[26] S. Yang, M. Powell, B. Falsafi, K. Roy, and T. Vijaykumar, “An Integrated Circuit/Architecture Approach to Reducing Leakage in Deep-Submicron High-Performance I-Caches,” Proc. Seventh Int'l Symp. High-Performance Computer Architecture, pp. 147-157, 2001.
[27] W. Zhang, J. Hu, V. Degalahal, M. Kandemir, N. Vijaykrishnan, and M.J. Irwin, “Compiler-Directed Instruction Cache Leakage Optimization,” Proc. 35th Ann. IEEE/ACM Int'l Symp. Microarchitecture, pp. 208-218, 2002.
[28] ARM, ARM 1136 Technical Reference Manual, http:/www.arm. com, 2007.
[29] ITRS (Int'l Technology Roadmap for Semiconductor), http:/, 2001.
[30] Standard Performance Evaluation Corp., http:/www.specbench. org, 2007.
[31] VAR Business, Intel Clears Up Post-Tejas Confusion, breakingnews.jhtml? articleId=18842588, 2007.
20 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool