This Article 
 Bibliographic References 
 Add to: 
Complexity-Effective Reorder Buffer Designs for Superscalar Processors
June 2004 (vol. 53 no. 6)
pp. 653-665

Abstract—All contemporary dynamically scheduled processors support register renaming to cope with false data dependencies. One of the ways to implement register renaming is to use the slots within the Reorder Buffer (ROB) as physical registers. In such designs, the ROB is a large multiported structure that occupies a significant portion of the die area and dissipates a sizable fraction of the total chip power. The heavily ported ROB is also likely to have a large delay that can limit the processor clock rate. We consider several approaches for reducing the ROB complexity in processors that use the ROB slots to implement physical registers. The first approach exploits the fact that the bulk of the source operand reads are satisfied through forwarding or reading of the committed register values. Our technique completely eliminates the read ports needed on the ROB for reading source operands. A small set of associatively addressed retention latches is used to compensate for the resulting performance degradation by caching the most recently produced results. The second technique relies on a distributed implementation that spreads the centralized ROB structure across the function units (FUs), with each distributed component sized to match the FU workload and with one write port and two read ports on each component. The third approach combines the use of retention latches and a distributed ROB implementation that uses minimally ported distributed components. The net result of combining the two techniques is the ROB distribution with minimal conflicts over the read and no conflicts over the write ports. Our designs are evaluated using the simulation of SPEC 2000 benchmarks and measurements of the actual ROB layouts in a 0.18 micron CMOS process.

[1] D. Burger and T.M. Austin, The SimpleScalar Tool Set: Version 2.0 technical report, Dept. of Computer Science, Univ. of Wisconsin-Madison, June 1997, documentation for all Simplescalar releases (through version 3.0).
[2] D.M. Brooks et al., "Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors, IEEE Micro, vol. 20, no. 6, Nov.-Dec. 2000, pp. 26-44.
[3] R. Balasubramonian, S. Dwarkadas, and D. Albonesi, Reducing the Complexity of the Register File in Dynamic Superscalar Processor Proc. 34th Int'l Symp. Microarchitecture (MICRO-34), 2001.
[4] E. Borch, E. Tune, S. Manne, and J. Emer, Loose Loops Sink Chips Proc. Int'l Conf. High Performance Computer Architecture (HPCA-02), 2002.
[5] J.L. Cruz et al., Multiple-Banked Register File Architecture Proc. 27th Int'l Symp. Computer Architecture, pp. 316-325, 2000.
[6] D. Folegnani and A. Gonzalez, Energy-Effective Issue Logic Proc. Int'l Symp. Computer Architecture, July 2001.
[7] L. Gwennap, PA-8000 Combines Complexity and Speed Microprocessor Report, vol 8, no. 15, 1994.
[8] Z. Hu and M. Martonosi, Reducing Register File Power Consumption by Exploiting Value Lifetime Characteristics Proc. Workshop Complexity-Effective Design, 2000.
[9] Intel Corp., The Intel Architecture Software Developers Manual 1999.
[10] R.E. Kessler, “The Alpha 21264 Microprocessor,” IEEE Micro, vol. 19, no. 2, pp. 24–36, Mar./Apr. 1999.
[11] G. Lozano and G. Gao, Exploiting Short-Lived Variables in Superscalar Processors Proc. Int'l Symp. Microarchitecture, pp. 292-302, 1995.
[12] J. Llosa, M. Valero, and E. Ayguade, “Non-Consistent Dual Register Files to Reduce Register Pressure,” Proc. First Ann. Int'l Symp. High-Performance Computer Architecture, pp. 22-31, Jan. 1995.
[13] M. Slater, AMD's K5 Designed to Outrun Pentium Microprocessor Report, vol. 8, no. 14, 1994.
[14] J. Smith and A. Pleszkun, Implementation of Precise Interrupts in Pipelined Processors Proc. Int'l Symp. Computer Architecture, pp. 36-44, 1985.
[15] Standard Performance Evaluation Corp., Spec2000 2000, http:/
[16] S. Wallace and N. Bagheryadeh, "A Scalable Register File Architecture for Dynamically Scheduled Processors," Proc. 1996 Conf. Parallel Architectures and Compilation Techniques, 1996, pp. 179-184.
[17] G. Savransky, R. Ronen, and A. Gonzalez, Lazy Retirement: A Power Aware Register Management Mechanism Proc. Workshop Complexity-Effective Design, 2002.
[18] G. Kucuk, D. Ponomarev, and K. Ghose, Low-Complexity Reorder Buffer Architecture Proc. Int'l Conf. Supercomputing (ICS '02), pp. 57-66, 2002.
[19] D. Ponomarev, G. Kucuk, and K. Ghose, Energy-Efficient Design of the Reorder Buffer Proc. 12th Int'l Workshop Power and Timing Modeling, Optimization, and Simulation (PATMOS 2002), pp. 289-299, 2002.
[20] G. Kucuk, K. Ghose, D. Ponomarev, and P. Kogge, Energy-Efficient Instruction Dispatch Buffer Design for Superscalar Processors Proc. Int'l Symp. Low Power Electronics and Design (ISLPED '01), pp. 237-242, Aug. 2001.
[21] S.P. Song, M. Denman, and J. Chang, "The PowerPC 604 RISC Microprocessor," IEEE Micro, Oct. 1994, pp. 8-17.
[22] G. Sohi and S. Vajapeyam, Instruction Issue Logic for High-Performance, Interruptable Pipelined Processors Proc. Int'l Symp. Computer Architecture (ISCA), pp. 27-34, 1987.
[23] R. Canal, J. Parcerisa, and A. González, Dynamic Cluster Assignment Mechanisms Proc. Sixth Int'l Symp. High-Performance Computer Architecture (HPCA '00), pp. 133-144, Jan. 2000.
[24] K. Farkas et al., "The Multicluster Architecture: Reducing Cycle Time Through Partitioning," to appear in Proc. 30th Ann. IEEE/ACM Int'l Symp Microarchitecture, IEEE Computer Society, Press, Los Alamitos, Calif., 1997.
[25] I. Park, M.D. Powell, and T.N. Vijaykumar, Reducing Register Ports for Higher Speed and Lower Energy Proc. 35th Ann. ACM/IEEE Int'l Symp. Microarchitecture (MICRO '02), pp. 171-182, Dec. 2002.
[26] D. Ponomarev, G. Kucuk, and O. Ergin, Reducing Datapath Energy through the Isolation of Short-Lived Operands Proc. 12th Int'l Conf. Parallel Architectures and Compilation Techniques (PACT), 2003.

Index Terms:
Reorder buffer, complexity-effective design, low-power datapath, register file.
Gurhan Kucuk, Dmitry V. Ponomarev, Oguz Ergin, Kanad Ghose, "Complexity-Effective Reorder Buffer Designs for Superscalar Processors," IEEE Transactions on Computers, vol. 53, no. 6, pp. 653-665, June 2004, doi:10.1109/TC.2004.5
Usage of this product signifies your acceptance of the Terms of Use.