The Community for Technology Leaders
RSS Icon
Issue No.07 - July (2008 vol.57)
pp: 940-951
Patrick Ndai , Purdue University, West Lafayette
Swarup Bhunia , Case Western Reserve University, Cleveland
Amit Agarwal , Intel Corporation, Hillsboro
Kaushik Roy , Purdue University, West Lafayette
Within-die parameter variations can cause wide delay distribution among similar functional units in superscalar processors. Conventionally, the frequency of operation is reduced to accommodate the slowest unit, which in turn degrades throughput. We present a low-overhead design technique that sets the operating frequency in a superscalar processor based on the faster units, and allows more cycles for the slower units. We propose an associated priority scheduling strategy to schedule instructions in the functional units to maximize throughput. Simulation results on a set of benchmarks show that by assigning a higher scheduling priority to faster units, we can achieve 18% improvement in performance on average with negligible design overhead.
Superscalar Processors, Scheduling, Variable-cycle functional unit, process variation, speed binning.
Patrick Ndai, Swarup Bhunia, Amit Agarwal, Kaushik Roy, "Within-Die Variation-Aware Scheduling in Superscalar Processors for Improved Throughput", IEEE Transactions on Computers, vol.57, no. 7, pp. 940-951, July 2008, doi:10.1109/TC.2008.40
[1] B.D. Cory, R. Kapur, and B. Underwood, “Speed Binning with Path Delay Test in 150-nm Technology,” IEEE Design and Test of Computers, vol. 20, no. 5, pp. 41-45, Sept.-Oct. 2003.
[2] S. Pateras, “Achieving At-Speed Structural Test,” IEEE Design and Test of Computers, vol. 20, no. 5, pp. 26-33, Sept.-Oct. 2003.
[3] M. Bekerman, A. Mendelson, and G. Sheaffer, “Performance and Hardware Complexity Tradeoffs in Designing Multithreaded Architectures,” Proc. Fifth Int'l Conf. Parallel Architectures and Compilation Techniques, 1996.
[4] S. Palacharla, N.P. Jouppi, and J.E. Smith, “Complexity-Effective Superscalar Processors,” Proc. 24th Int'l Symp. Computer Architecture, June 1997.
[5] Y. Chen, H. Li, K. Roy, and C.-K. Koh, “Cascaded Carry-Select Adder (C2SA): A New Structure for Low-Power CSA Design,” Proc. Int'l Symp. Low Power Electronics and Design, 2005.
[6] International Technology Roadmap for Semiconductor (ITRS), http:/, Update, 2004.
[7] Predictive Technology Model,, 2008.
[8] Spec 2000 Benchmarks, http:/, 2008.
[9] J. Tschanz, K. Bowman, and V. De, “Variation-Tolerant Circuits: Circuit Solutions and Techniques,” Proc. 42nd Design Automation Conf., 2005.
[10] L. Chen, S. Ravi, A. Raghunathan, and S. Dey, “A Scalable Software-Based Self Test Methodology for Programmable Processors,” Proc. 40th Design Automation Conf., 2003.
[11] A. Krstic, L. Chen, W.-C. Lai, K.-T. Cheng, and S. Dey, “Embedded Software-Based Self-Test for Programmable Core-Based Designs,” IEEE Design and Test of Computers, vol. 19, no. 4, pp. 18-27, July/Aug. 2002.
[12] N. Kranitis, G. Xenoulis, D. Gizopoulos, A. Paschalis, and Y. Zorian, “Low-Cost Software-Based Self-Testing of RISC Processor Cores,” Proc. Design, Automation and Test in Europe Conf. and Exposition, 2003.
[13] Y.-C. Lin, F. Lu, and K.-T. Cheng, “Pseudo-Functional Scan-Based BIST for Delay Fault,” Proc. 23rd IEEE VLSI Test Symp., 2005.
[14] M.L. Bushnell and V.D. Agrawal, Essentials of Electronic Testing for Digital, Memory, and Mixed-Signal VLSI Circuits, first ed. Springer, 2000.
[15] P. Bernardi, G. Masera, F. Quaglio, and M. Sonza Reorda, “Testing Logic Cores Using a BIST P1500 Compliant Approach: A Case of Study,” Proc. Design, Automation and Test in Europe Conf. and Exposition, 2005.
[16] S. Borkar et al., “Parameter Variations and Impact on Circuits and Microarchitecture,” Proc. 40th Design Automation Conf., pp. 338-342, 2003.
[17] The SimpleScalar Architectural Research Tool Set Version 3.0d, http:/, 2008.
[18] A. Datta, S. Bhunia, J.H. Choi, S. Mukhopadhyay, and K. Roy, “Speed Binning Aware Design Methodology to Improve Profit under Parameter Variations,” Proc. 11th Asia and South Pacific Design Automation Conf., Jan. 2006.
[19] A. Raychowdhury, S. Ghosh, and K. Roy, “A Novel On-Chip Delay Measurement Hardware for Efficient Speed-Binning,” Proc. 11th IEEE Int'l On-Line Testing Symp., July 2005.
[20] D. Ernst et al., “RAZOR: A Low-Power Pipeline Based on Circuit-Level Timing Speculation,” Proc. 36th Ann. IEEE/ACM Int'l Symp. Microarchitecture, Dec. 2003.
[21] H. Li, S. Bhunia, Y. Chen, K. Roy, and T.N. Vijaykumar, “DCG: Deterministic Clock-Gating for Low-Power Microprocessor Design,” IEEE Trans. Very Large Scale Integration, Mar. 2004.
[22] N. Muralimanohar, K. Ramani, and R. Balasubramonian, “Power Efficient Resource Scaling Partitioned Architectures through Dynamic Heterogeneity,” Proc. IEEE Int'l Symp. Performance Analysis of Systems and Software, 2006.
[23] L. Chen, D. Albonesi, and S. Drophso, “Dynamically Matching ILP Characteristics via a Heterogeneous Clustered Microarchitecture,” Proc. IBM Watson Conf. Interaction between Architecture, Circuits and Compilers, Oct. 2004.
[24] The Wattch Toolset,, 2008.
36 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool