This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Integrated Analysis of Power and Performance for Pipelined Microprocessors
August 2004 (vol. 53 no. 8)
pp. 1004-1016

Abstract—Choosing the pipeline depth of a microprocessor is one of the most critical design decisions that an architect must make in the concept phase of a microprocessor design. To be successful in today's cost/performance marketplace, modern CPU designs must effectively balance both performance and power dissipation. The choice of pipeline depth and target clock frequency has a critical impact on both of these metrics. In this paper, we describe an optimization methodology based on both analytical models and detailed simulations for power and performance as a function of pipeline depth. Our results for a set of SPEC2000 applications show that, when both power and performance are considered for optimization, the optimal clock period is around 18 FO4. We also provide a detailed sensitivity analysis of the optimal pipeline depth against key assumptions of our energy models. Finally, we discuss the potential risks in design quality for overly aggressive or conservative choices of pipeline depth.

[1] D.M. Brooks et al., "Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors, IEEE Micro, vol. 20, no. 6, Nov.-Dec. 2000, pp. 26-44.
[2] V. Srinivasan, D. Brooks, M. Gschwind, P. Bose, V. Zyuban, P. Strenski, and P. Emma, Optimizing Pipelines for Power And Performance Proc. ACM/IEEE 35th Int'l Symp. Microarchitecture, Nov. 2002.
[3] S.R. Kunkel and J.E. Smith, Optimal Pipelining in Supercomputers Proc. 13th Int'l Symp. Computer Architecture (ISCA-13), pp. 404-411, June 1986.
[4] P. Dubey and M. Flynn, Optimal Pipelining J. Parallel and Distributed Computing, vol. 8, pp. 10-19, 1990.
[5] A. Hartstein and T.R. Puzak, "The Optimum Pipeline Depth for a Microprocessor," Proc. 29th Int'l Symp. Computer Architecture, IEEE CS Press, 2002, pp. 7-13.
[6] M.S. Hrishikesh et al., "The Optimal Logic Depth Per Pipeline Stage Is 6 to 8 FO4 Inverter Delays," Proc. 29th Int'l Symp. Computer Architecture, IEEE CS Press, 2002, pp. 14-24.
[7] E. Sprangle and D. Carmean, "Increasing Processor Performance by Implementing Deeper Pipelines," Proc. 29th Ann. Int'l Symp. Computer Architecture (ISCA 02), IEEE CS Press, 2002, pp. 25-34.
[8] P. Kogge, The Architecture of Pipelined Computers. Hemisphere Publishing, 1981.
[9] M. Moudgill, P. Bose, and J.H. Moreno, Validation of Turandot, a Fast Processor Model for Microarchitecture Exploration Proc. IEEE Int'l Performance, Computing, and Comm. Conf. (IPCCC), pp. 451-457, Feb. 1999.
[10] M. Moudgill, J-D Wellman, and J.H. Moreno, "Environment for PowerPC Microarchitecture Exploration," IEEE Micro, Vol. 19 No. 3, May/June 1999, pp. 15-25.
[11] V. Zyuban and P. Strenski, Balancing Hardware Intensity in Microprocessor Pipelines IBM J. Research and Development, vol. 47, nos. 5/6, pp. 585-598, 2003.
[12] D. Brooks, P. Bose, V. Srinivasan, M. Gschwind, P. Emma, and M. Rosenfield, Microarchitecture-Level Power-Performance Analysis: The Powertimer Approach IBM J. Research and Development, vol. 47, nos. 5/6, pp. 653-670, 2003.
[13] D. Brooks, V. Tiwari, and M. Martonosi, Wattch: A Framework for Architectural-Level Power Analysis and Optimizations Proc. 27th Ann. Int'l Symp. Computer Architecture, pp. 83-94, June 2000.
[14] N. Vijaykrishnan, M. Kandemir, M. J. Irwin, H. Kim, and W. Ye, “Energy-Driven Integrated Hardware-Software Optimizations Using SimplePower,” Proc. Int'l Symp. Computer Architecture, 2000.
[15] V. Zyuban, Inherently Lower Power High Performance Superscalar Architectures PhD thesis, Univ. of Notre Dame, Mar. 2000.
[16] J.S. Neely, H.H. Chen, S.G. Walker, J. Venuto, and T. Bucelot, CPAM: A Common Power Analysis Methodology for High-Performance VLSI Design Proc. Ninth Topical Meeting Electrical Performance of Electronic Packaging, pp. 303-306, 2000.
[17] V. Zyuban, Optimization of Scannable Latches for Low Energy IEEE Trans. VLSI Systems, vol. 11, pp. 778-787, Oct. 2003.
[18] V. Iyengar, L. Trevillyan, and P. Bose, "Representative Traces for Processor Models with Infinite Cache," Proc. Second Symp. High-Performance Computer Architecture, IEEE Computer Soc. Press, Los Alamitos, Calif., Feb. 1996, pp. 62-72.
[19] M.J. Flynn et al., "Deep-Submicron Microprocessor Design Issues," IEEE Micro, Vol. 19 No. 4, July/Aug. 1999, pp. 11-22.
[20] R. Gonzalez and M. Horowitz, Energy Dissipation in General Purpose Microprocessors IEEE J. Solid-State Circuits, vol. 31, no. 9, Sept. 1996.
[21] R. Jessani and C. Olson, The Floating-Point Unit of the PowerPC 603e Microprocessor IBM J. Research and Development, vol. 40, no. 5, pp. 559-566, Sept. 1996.
[22] S. Heo, R. Krashinsky, and K. Asanovic, Activity-Sensitive Flip-Flop and Latch Selection for Reduce Energy Proc. 19th Conf. Advanced Research in VLSI, Mar. 2001.
[23] V.G. Oklobdzija, Clocking and Storage Elements in a Multi-Gigahertz Environment IBM J. Research and Development, vol. 47, nos. 5/6, pp. 567-584, 2003.

Index Terms:
Low-power design, energy-aware systems, pipeline processors, performance analysis and design aids, microprocessors and microcomputers.
Citation:
Victor Zyuban, David Brooks, Viji Srinivasan, Michael Gschwind, Pradip Bose, Philip N. Strenski, Philip G. Emma, "Integrated Analysis of Power and Performance for Pipelined Microprocessors," IEEE Transactions on Computers, vol. 53, no. 8, pp. 1004-1016, Aug. 2004, doi:10.1109/TC.2004.46
Usage of this product signifies your acceptance of the Terms of Use.