This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Reducing Rename Logic Complexity for High-Speed and Low-Power Front-End Architectures
June 2006 (vol. 55 no. 6)
pp. 672-685
In modern day high-performance processors, the complexity of the register rename logic grows along with the pipeline width and leads to larger renaming time delay and higher power consumption. Renaming logic in the front-end of the processor is one of the largest contributors of peak temperatures on the chip and, so, demands attention to reduce the power consumption. Further, with the advent of clustered microarchitectures, the rename map table at the front-end is shared by the clusters and, hence, its critical path delay should not become a bottleneck in determining the processor clock cycle time. Analysis of characteristics of Spec2000 integer benchmark programs reveals that, when the programs are processed in a 4-wide processor, none or only one two-source instruction (an instruction with two source registers) is renamed in a cycle for 94 percent of the total execution time. Similarly, in an 8-wide processor, none or only one two-source instruction is renamed in a cycle for 92 percent of the total execution time. Thus, the analysis observes that the rename map table port bandwidth is highly underutilized for a significant portion of time. Based on the analysis, in this paper, we propose a novel technique to significantly reduce the number of ports in the rename map table. The novelty of the technique is that it is easy to implement and succeeds in reducing the access time, power, and area of the rename logic, without any additional power, area, and delay overheads in any other logic on the chip. The proposed technique performs the register renaming of instructions in the order of their fetch, with no significant impact on the processor's performance. With this technique in an 8-wide processor, as compared to a conventional rename map table in an integer pipeline with 16 ports to look up source operands, a rename map table with nine ports results in a reduction in access time, power, and area by 14 percent, 42 percent, and 49 percent, respectively, with only 4.7 percent loss in instructions committed per cycle (IPC). The implementation of the technique in a 4-wide processor results in a reduction in access time, power, and area by 7 percent, 38 percent, and 59 percent, respectively, with an IPC loss of only 4.4 percent.

[1] D. Sima, “The Design Space of Register Renaming Techniques,” IEEE Micro, vol. 20, no. 5, pp. 70-83, Sept./Oct. 2000.
[2] S. Palacharla, N.P. Jouppi, and J.E. Smith, “Complexity-Effective Superscalar Processors,” Proc. 24th Ann. Int'l Symp. Computer Architecture, pp. 206-218, 1997.
[3] R. Canal, J.M. Parcerisa, and A. Gonzalez, “Dynamic Cluster Assignment Mechanisms,” Proc. Sixth IEEE Ann. Int'l Symp. High-Performance Computer Architecture (HPCA-6), pp. 133-142, Jan. 2000.
[4] P. Chaparro, J. Gonzalez, and A. Gonzalez, “Thermal-Aware Clustered Microarchitectures,” Proc. IEEE Int'l Conf. Computer Design: VLSI in Computers and Processors (ICCD-2004), pp. 48-53, Oct. 2004.
[5] R.E. Kessler, “The Alpha 21264 Microprocessor,” IEEE Micro, vol. 19, no. 2, pp. 24-36, Mar./Apr. 1999
[6] P. Chaparro, G. Magklis, J. Gonzalez, and A. Gonzalez, “Distributing the Frontend for Temperature Reduction,” Proc. 11th IEEE Ann. Int'l Symp. High-Performance Computer Architecture (HPCA-11), pp. 61-70, Feb. 2005.
[7] H. DeVries, “Looking at Intel's Prescott Die,” http://chip-architect.com/news2003_03_06_Looking_at_Intels_ Prescott.html , 2003.
[8] A. Moshovos, “Checkpointing Alternatives for High-Performance, Power-Aware Processors,” Proc. 2003 Int'l Symp. Low Power Electronics and Design (ISLPED), pp. 318-321, 2003.
[9] J.E. Smith and G. Sohi, “The Microarchitecture of Superscalar Processors,” Proc. IEEE, vol. 83, no. 12, Dec. 1995.
[10] A.D. Gloria and M. Olivieri, “An Application Specific Multi-Port RAM Cell Circuit for Register Renaming Units in High Speed Microprocessors,” Proc. IEEE Int'l Symp. Circuits and Systems (ISCAS), pp. 934-937, 2001.
[11] T.N. Buti, R.G. McDonald, Z. Khwaja, A. Amdedkar, H.Q. Le, W.E. Burky, and B. Williams, “Organization and Implementation of the Register Renaming Mapper for Out-of-Order IBM Power4 Processors,” IBM J. Research and Development, vol. 49, no. 1, pp. 167-188, Jan. 2005.
[12] D. Burger and T.M. Austin, “The SimpleScalar Tool Set, Version 2.0,” Technical Report #1342, Computer Sciences Dept., Univ. of Wisconsin-Madison, June 1997.
[13] E. Perelman, G. Hamerly, M.V. Biesbrouck, T. Sherwood, and B. Calder, “Using SimPoint for Accurate and Efficient Simulation,” Proc. ACM SIGMETRICS, pp. 318-319, 2003.
[14] I. Kim and M.H. Lipasti, “Half-Price Architecture,” Proc. 30th Ann. Int'l Symp. Computer Architecture, pp. 28-38, 2003.
[15] P.S. Oberoi and G.S. Sohi, “Parallelism in the Front-End,” Proc. 30th Ann. Int'l Symp. Computer Architecture, pp. 230-240, 2003.
[16] S. Nadathur and A. Tyagi, “A Dependence Driven Efficient Dispatch Scheme,” Proc. 21st Int'l Conf. Computer Design (ICCD), pp. 299-306, 2003.
[17] A. Moshovos, “Power-Aware Register Renaming,” Computer Engineering Group Technical Report 01-08-02, Univ. of Toronto, 2002.
[18] V. Sankaranarayanan and A. Tyagi, “A Hierarchical Dependence Check and Folded Rename Mapping Based Scalable Dispatch Stage,” Proc. Int'l Conf. Computer Design (ICCD), pp. 249-254, 2001.
[19] E. Sprangle and D. Carmean, “Increasing Processor Performance by Implementing Deeper Pipelines,” Proc. 29th Ann. Int'l Symp. Computer Architecture, pp. 25-34, 2002.
[20] E. Sprangle and Y. Patt, “Facilitating Superscalar Processing via a Combined Static/Dynamic Register Renaming Scheme,” Proc. 24th IEEE Ann. Int'l Symp. Microarchitecture, pp. 143-147, 1994.

Index Terms:
Wide-issue processors, integer pipeline, rename logic complexity, front-end power consumption.
Citation:
Rama Sangireddy, "Reducing Rename Logic Complexity for High-Speed and Low-Power Front-End Architectures," IEEE Transactions on Computers, vol. 55, no. 6, pp. 672-685, June 2006, doi:10.1109/TC.2006.88
Usage of this product signifies your acceptance of the Terms of Use.