This Article 
 Bibliographic References 
 Add to: 
Inherently Lower-Power High-Performance Superscalar Architectures
March 2001 (vol. 50 no. 3)
pp. 268-285

Abstract—In recent years, reducing power has become an important design goal for high-performance microprocessors. This work attempts to bring the power issue to the earliest phases of microprocessor development, in particular, the stage of defining a chip microarchitecture. We investigate power-optimization techniques of superscalar microprocessors at the microarchitecture level that do not compromise performance. First, major targets for power reduction are identified within microarchitecture, where power is heavily consumed or will be heavily consumed in next-generation superscalar processors. Then, a new, energy-efficient version of a multicluster microarchitecture is developed that reduces energy in the identified critical design points with minimal performance impact. A methodology is developed for energy-performance optimization at the microarchitecture level that generates, for a microarchitecture, a set of energy-efficient configurations, forming a convex hull in the power-performance space. Detailed simulation of the baseline and proposed multicluster architectures has been performed using the developed optimization methodology. A comparison of the two microarchitectures, both optimized for energy efficiency, shows that the multicluster architecture is potentially up to twice as energy efficient for wide issue processors, with an advantage that grows with the issue width. Conversely, at the same power dissipation level, the multicluster architecture supports configurations with measurably higher performance than equivalent conventional designs.

[1] M. Bekerman, S. Jourdan, R. Ronen, G. Kirshenboim, L. Rappoport, A. Yoaz, and U. Weiser, “Correlated Load-Address Predictors,” Proc. 26th Int'l Symp. Computer Architecture, May 1999.
[2] D. Burger and T. Austin, “The SimpleScalar Tool Set, Version 2.0,” Technical Report 1342, Computer Science Dept., Univ. of Wisconsin-Madison, 1997.
[3] R.P. Colwell, R.P. Nix, J.J. O'Donnell, D.B. Papworth,, and P.K. Rodman, ``A VLIW Architecture for a Trace Scheduling Compiler,'' IEEE Trans. Computers, vol. 37, no. 8, pp. 967-979, Aug. 1988.
[4] T. Diep et al., "Performance Evaluation of the PowerPC 620 Microarchitecture," Proc. 22nd Int'l Symp. Computer Architecture, May 1995, pp. 163-174.
[5] D. Dobberpuhl, “The Design of a High Performance Low Power Microprocessor,” Proc. Int'l Symp. Low Power Electronics and Design, pp. 11-16, Aug. 1996.
[6] K. Farkas, “Memory-System Design Considerations for Dynamically-Scheduled Microprocessors,” PhD thesis, Univ. of Toronto, 1997.
[7] K. Farkas et al., "The Multicluster Architecture: Reducing Cycle Time Through Partitioning," to appear in Proc. 30th Ann. IEEE/ACM Int'l Symp Microarchitecture, IEEE Computer Society, Press, Los Alamitos, Calif., 1997.
[8] J. Farrell and T. Fischer, “Issue Logic for a 600-MHz Out-of-Order Execution Microprocessor,” IEEE J. Solid-State Circuits, vol. 33, no. 5, May 1998.
[9] M. Franklin,"The Multiscalar Architecture," PhD thesis, Computer Sciences Dept., Univ. of Wisconsin—Madison, 1993. Also Technical Report TR 1196, Computer Sciences Dept., Univ. of Wisconsin—Madison, 1993.
[10] M. Franklin and G.S. Sohi,"The Expandable Split Window Paradigm for Exploiting Fine-Grain Parallelism," Proc. 19th Ann. Int'l Symp. Computer Architecture, pp. 58-67, 1992.
[11] B.A. Gieseke et al., "A 600 MHz Superscalar RISC Microprocessor with Out-of-Order Execution," 1997 IEEE Int'l Solid-State Circuits Conf. Digest of Technical Papers, IEEE Press, New York, 1997, pp. 176-178.
[12] R. Gonzalez and M. Horowitz, Energy Dissipation in General Purpose Microprocessors IEEE J. Solid-State Circuits, vol. 31, no. 9, Sept. 1996.
[13] C. Hu, “Future CMOS Scaling and Reliability,” Proc. IEEE, vol. 81, no. 5, May 1993.
[14] K. Itoh, K. Sasaki, and Y. Nakagome, “Trends in Low-Power RAM Circuit Technologies,” Proc. IEEE, vol. 83, no. 4, pp. 524-543, Apr. 1995.
[15] M. Johnson, Superscalar Microprocessor Design. Englewood Cliffs, N.J.: Prentice Hall, 1991.
[16] R. Jolly, “A 9-ns 1.4 Gigabyte 17-Ported CMOS Register File,” IEEE J. Solid-State Circuits, vol. 25, no. 10, pp. 1407-1412, Oct. 1991.
[17] R.E. Kessler, “The Alpha 21264 Microprocessor,” IEEE Micro, vol. 19, no. 2, pp. 24–36, Mar./Apr. 1999.
[18] A. Kumar, The HP PA-8000 RISC CPU Computer, vol. 17, no. 3, pp. 27-32, Mar. 1997.
[19] J. Montanaro et al., “A 160MHz 32b 0.5W CMOS RISC Microprocessor,” Proc. Int'l Solid-State Circuits Conf., 1996.
[20] S. Palacharla, N.P. Jouppi, and J.E. Smith, "Complexity-Effective Superscalar Processors," Proc. Int'l Symp. Computer Architecture, ACM, 1997, pp. 206-218.
[21] S. Patel, D. Friendly, and Y. Patt, “Critical Issues Regarding the Trace Cache Fetch Mechanisms,” Technical Report CSE-TR-335-97, Univ. of Michigan, May 1997.
[22] Y.N. Patt et al., "One Billion Transistors, One Uniprocessor, One Chip," Computer, Sept. 1997, pp. 51-58.
[23] E. Rotenberg, S. Bennett, and J. Smith, "Trace Cache: A Low Latency Approach to High Bandwidth Instruction Fetching," Proc. 29th Ann. ACM/IEEE Int'l Symp. on Microarchitecture, IEEE CS Press, Los Alamitos, Calif., 1996, pp. 24-34.
[24] E. Rotenberg, S. Bennett, and J. Smith, “Trace Cache: A Low Latency Approach to High Bandwidth Instruction Fetching,” Technical Report 1310, Univ. of Wisconsin-Madison, 1996.
[25] T. Sato et al., “Evaluation of Architecture-Level Power Estimation for CMOS RISC Processors,” IEEE Symp. Low Power Electronics, pp. 44-45, Oct. 1995.
[26] J. Smith and G. Sohi, “The Microarchiteture of Superscalar Processors,” Proc. IEEE, Dec. 1995.
[27] M. Tremblay, D. Greenley, and K. Normoyle, “The Design of the Micro-Architecture of Ultra SPARC-I™,” Proc. IEEE, pp. 1651-1663, Dec. 1995.
[28] M. Tremblay, B. Joy, and K. Shin, "A Three Dimensional Register File for Superscalar Processors," Proc. 28th Ann. Hawaii Int'l Conf. Systems Sciences, IEEE CS Press, 1995, pp. 191-201.
[29] S. Vajapeyam and T. Mitra, "Improving Superscalar Instruction Dispatch and Issue by Exploiting Dynamic Code Sequences, Proc. 24th Int'l Symp. Computer Architecture, ACM Press, New York, 1997, pp. 1-12.
[30] N. Vasseghi et al., "200-MHz Superscaler RISC Microprocessor," IEEE J. Solid-State Circuits, Vol. 31, 1996, pp. 1,675-1,686.
[31] T. Williams, N. Parkar, and G. Shen, "SPARC64: A 64-b 64-Active-Instruction Out-of-Order-Execution MCM Processor," IEEE J. Solid-State Circuits, vol. 30, no. 11, pp. 1,215-1,226, Nov. 1995.
[32] K.C. Yeager, “The MIPS R10000 Superscalar Microprocessor,” IEEE Micro, vol. 16, no. 2, pp. 28–40, Apr. 1996.
[33] A. Yoaz et al., "Speculation Techniques for Improving Load Related Instruction Scheduling," Proc. 26th Ann. Int'l Symp. Computer Architecture (ISCA 99), IEEE CS Press, Los Alamitos, Calif., 1999, pp. 42-53.
[34] J. Yuan and C. Svensson, “New Sengle-Clock CMOS Latches and Flipflops with Improved Speed and Power Savings,” IEEE J. Solid-State Circuits, vol. 32, no. 1, pp. 62-69, Jan. 1997.
[35] V. Zyuban, "Inherently Lower-Power High Performance Super Scalar Architectures," PhD thesis, Dept. of Computer Science and Engineering, Univ. of Notre Dame, Ind., 2000.
[36] V. Zyuban and P. Kogge, “The Energy Complexity of Register Files,” Proc. Int'l Symp. Lower Power Electronics and Design, pp. 305-310, Aug. 1998.
[37] V. Zyuban and P. Kogge, “Split Register File Architectures for Inherently Lower Power Microprocessors,” Proc. Power-Driven Microarchitecture Workshop, in conjunction with ISCA '98, pp. 32-37, June 1998.
[38] V. Zyuban and P. Kogge, “Application of State Transition Diagram Method to Estimating Power Dissipation in Latches,” IEEE Trans. VLSI Systems, vol. 7, no. 1, Mar. 1999.
[39] V. Zyuban and P. Kogge, “Optimization of High-Performance Superscalar Architectures for Energy Efficiency,” Proc. Int'l Symp. Lower Power Electronics and Design, pp. 84-89, Aug. 2000.

Index Terms:
Low power microarchitecture, multicluster architecture, energy-efficient configurations, energy models.
Victor V. Zyuban, Peter M. Kogge, "Inherently Lower-Power High-Performance Superscalar Architectures," IEEE Transactions on Computers, vol. 50, no. 3, pp. 268-285, March 2001, doi:10.1109/12.910816
Usage of this product signifies your acceptance of the Terms of Use.