This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Scalable Dynamic Instruction Scheduler through Wake-Up Spatial Locality
November 2007 (vol. 56 no. 11)
pp. 1534-1548
In a high-performance superscalar processor, the instruction scheduler often comes with poor scalability and high complexity due to the expensive wakeup operation. From detailed simulation-based analyses, we find that 95% of the wakeup distances between two dependent instructions are short, in the range of 16 instructions, and 99% are in the range of 31 instructions. We apply this wakeup spatial locality to the design of conventional CAM-based and matrix-based wakeup logic respectively. By limiting the wakeup coverage to i + 16 instructions where 0 =< i =< 15 for 16-entry segments, the proposed wakeup designs confine the wakeup operation in two matrix-based or three CAM-based 16-entry segments no matter how large the issue window size is. The experimental results show that for an issue window of 128 entries (IW128) or 256 entries (IW256), the proposed CAM-based wakeup locality design saves 65% (IW128) and 76% (IW256) of the power consumption, reduces 44% (IW128) and 78% (IW256) in the wakeup latency compared to the conventional CAM-based design with almost no performance loss. For the matrix-based wakeup logic, applying wakeup locality to the design drastically reduces the area cost. Extensive simulation results, including comparisons with previous works, show that the wakeup spatial locality is the key element to achieve scalability for future sophisticated instruction schedulers.

[1] S. Palacharla, N.P. Jouppi, and J.E. Smith, “Quantifying the Complexity of Superscalar Processors,” Technical Report CS-1328, Univ. of Wisconsin-Madison, May 1997.
[2] K. Wilcox and S. Manne, “Alpha Processors: A History of Power Issues and a Look to the Future,” Proc. 32nd Ann. Int'l Symp. Microarchitecture, Cool Chips Tutorial, Nov. 1999.
[3] A. Kumar, “The HP PA8000 RISC CPU,” IEEE Micro, vol. 17, no. 2, pp.27-32, Apr. 1997.
[4] G. Hinton et al., “The Microarchitecture of the Pentium 4 Processor,” Intel Technology J., Feb. 2001.
[5] K.C. Yeager, “MIPS R10000 Superscalar Microprocessor,” IEEE Micro, vol. 16, no. 2, pp. 28-40, Apr. 1996.
[6] R.E. Kessler, “The Alpha 21264 Microprocessor,” IEEE Micro, vol. 19, no. 2, pp. 24-36, Mar./Apr. 1999.
[7] M. Butler and Y.N. Patt, “An Investigation of the Performance of Various Dynamic Scheduling Techniques,” Proc. 25th Ann. Int'l Symp. Microarchitecture (MICRO '92), pp. 1-9, Dec. 1992.
[8] S.T. Srinivasan and A.R. Lebeck, “Load Latency Tolerance in Dynamically Scheduled Processors,” Proc. Ann. Int'l Symp. Microarchitecture (MICRO '98), pp. 148-159, Dec. 1998.
[9] L. Gwennap, “Intel's P6 Uses Decoupled Superscalar Design,” Microprocessor Report, vol. 9, no. 2, pp. 1-7, Feb. 1995.
[10] S.P. Song, M. Denman, and J. Chang, “The PowerPC 604 RISC Microprocessor,” IEEE Micro, vol. 14, no. 5, pp. 8-17, Oct. 1994.
[11] L. Gwennap, “HAL Reveals Multichip SPARC Processor,” Microprocessor Report, vol. 9, no. 3, pp. 1-7, Mar. 1995.
[12] J.L. Hennessy and D.A. Patterson, Computer Architecture: A Quantitative Approach, second ed. Morgan Kaufmann, 1996.
[13] M. Brown, J. Stark, and Y. Patt, “Select-Free Instruction Scheduling Logic,” Proc. Ann. Int'l Symp. Microarchitecture (MICRO '01), pp. 204-213, Dec. 2001.
[14] M. Goshima et al., “A High-Speed Dynamic Instruction Scheduling Scheme for Superscalar Processors,” Proc. Ann. Int'l Symp. Microarchitecture (MICRO '01), pp. 225-236, Dec. 2001.
[15] R. Ho, K.W. Mai, and M.A. Horowitz, “The Future of Wires,” Proc. IEEE, vol. 89, pp. 490-504, Apr. 2001.
[16] M.S. Hrishikesh, N.P. Jouppi, and K.I. Farkas, “The Optimal Useful Logic Depth per Pipeline Stages Is 6-8 FO4,” Proc. Ann. Int'l Symp. Computer Architecture (ISCA '02), pp. 14-24, May 2002.
[17] D. Folegnani and A. González, “Energy-Effective Issue Logic,” Proc. Ann. Int'l Symp. Computer Architecture (ISCA '01), pp. 230-239, July 2001.
[18] M.A. Ramírez et al., “A Simple Low-Energy Instruction Wakeup Mechanism,” Proc. Int'l Symp. High-Performance Computing (ISHPC '03), pp. 99-112, Oct. 2003.
[19] D. Ponomarev, G. Kucuk, and K. Ghose, “Reducing Power Requirements of Instruction Scheduling through Dynamic Allocation of Multiple Datapath Resources,” Proc. Ann. Int'l Symp. Microarchitecture (MICRO '01), pp. 90-101, Dec. 2001.
[20] J. Abella and A. González, “Power-Aware Adaptive Issue Queue and Register File,” Proc. Int'l Conf. High-Performance Computing (HiPC '03), Dec. 2003.
[21] D.H. Albonesi, “Dynamic IPC/Clock Rate Optimization,” Proc. Ann. Int'l Symp Computer Architecture (ISCA '98), pp. 282-292, June 1998.
[22] A. Buyuktosunoglu et al., “A Circuit Level Implementation of an Adaptive Issue Queue for Poweraware Microprocessors,” Proc. Great Lakes Symp. VLSI (GLSVLSI '01), pp. 73-83, Mar. 2001.
[23] S. Dropsho et al., “Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power,” Proc. 11th Parallel Architectures and Compilation Techniques, pp. 141-152, Sept. 2002.
[24] D. Ernst and T.M. Austin, “Efficient Dynamic Scheduling through Tag Elimination,” Proc. Ann. Int'l Symp. Computer Architecture (ISCA '02), pp. 37-46, May 2002.
[25] J.J. Sharkey et al., “Instruction Packing: Reducing Power and Delay of the Dynamic Scheduling Logic,” Proc. Int'l Symp. Low Power Electronics and Design (ISLPED '05), pp. 30-35, Aug. 2005.
[26] I. Kim and M.H. Lipasti, “Half-Price Architecture,” Proc. Ann. Int'l Symp Computer Architecture (ISCA '03), pp. 28-38, June 2003.
[27] A. Aggarwal et al., “Defining Wakeup Width for Efficient Dynamic Scheduling,” Proc. Int'l Conf. Computer Design (ICCD '04), pp. 36-41, Oct. 2004.
[28] D. Ernst, A. Hamel, and T. Austin, “Cyclone: A Broadcast-Free Dynamic Instruction Scheduler with Selective Replay,” Proc. Ann. Int'l Symp Computer Architecture (ISCA '03), pp. 253-262, June 2003.
[29] J. Hu, N. Vijaykrishnan, and M. Irwin, “Exploring Wakeup-Free Instruction Scheduling,” Proc. Int'l Symp. High Performance Computer Architecture (HPCA '04), pp. 232-241, Feb. 2004.
[30] A.R. Lebeck et al., “A Large, Fast Instruction Window for Tolerating Cache Misses,” Proc. Ann. Int'l Symp Computer Architecture (ISCA '02), pp. 59-70, May 2002.
[31] B. Fields, S. Rubin, and R. Bodík, “Focusing Processor Policies via Critical-Path Prediction,” Proc. Ann. Int'l Symp. Computer Architecture (ISCA '01), pp. 74-85, July 2001.
[32] E. Brekelbaum et al., “Hierarchical Scheduling Windows,” Proc. Ann. Int'l Symp. Microarchitecture (MICRO '02), pp. 27-36, Nov. 2002.
[33] D.S. Henry, B.C. Kuszmaul, G.H. Loh, and R. Sami, “Circuits for Wide-Window Superscalar Processors,” Proc. Ann. Int'l Symp. Computer Architecture (ISCA '00), pp. 236-247, June 2000.
[34] K.S. Hsiao and C.H. Chen, “An Efficient Wakeup Design for Energy Reduction in High-Performance Superscalar Processors,” Proc. Int'l Conf. Computing Frontiers (CF '05), pp. 353-360, May 2005.
[35] D.V. Ponomarev et al., “Energy-Efficient Issue Queue Design,” IEEE Trans. Very Large Scale Integration (VLSI) Systems, vol. 11, pp.789-800, Oct. 2003.
[36] M. Huang, J. Renau, and J. Torrellas, “Energy-Efficient Hybrid Wakeup Logic,” Proc. Int'l Symp. Low Power Electronics and Design (ISLPED '02), pp. 196-201, Aug. 2002.
[37] R. Canal and A. González, “A Low-Complexity Issue Logic,” Proc. Int'l Conf. Supercomputing (ICS '00), pp. 327-335, May 2000.
[38] R. Canal and A. Gonzalez, “Reducing the Complexity of the Issue Logic,” Proc. Int'l Conf. Supercomputing (ICS '01), pp. 312-320, June 2001.
[39] S. Palacharla, N.P. Jouppi, and J.E. Smith, “Complexity-Effective Superscalar Processors,” Proc. Ann. Int'l Symp Computer Architecture (ISCA '97), pp. 206-218, June 1997.
[40] P. Michaud and A. Seznec, “Data-Flow Prescheduling for Large Instruction Windows in Out-of-Order Processors,” Proc. Int'l Symp. High Performance Computer Architecture (HPCA '04), pp. 27-36, Jan. 2001.
[41] S.E. Raasch, N.L. Binkert, and S.K. Reinhardt, “A Scalable Instruction Queue Design Using Dependence Chains,” Proc. Ann. Int'l Symp Computer Architecture (ISCA '02), pp. 318-329, May 2002.
[42] D. Brooks, V. Tiwari, and M. Martonosi, “Wattch: A Framework for Architectural-Level Power Analysis and Optimizations,” Proc. Ann. Int'l Symp Computer Architecture (ISCA '00), pp. 83-94, June 2000.
[43] D. Burger and T.M. Austin, “The SimpleScalar Tool Set, v2.0,” Technical Report CS-1342, Univ. of Wisconsin-Madison, June 1997.
[44] C. Lee, M. Potkonjak, and W. Mangione-Smith, “MediaBench: A Tool for Evaluating Multimedia and Comm. Systems,” Proc. Ann. Int'l Symp. Microarchitecture (MICRO '97), pp. 330-335, Dec. 1997.
[45] K. Pagiamtzis and A. Sheikholeslami, “Content-Addressable Memory (CAM) Circuits and Architecture: A Tutorial and Survey,” IEEE J. Solid-State Circuits, vol. 41, no. 3, pp. 712-727, Mar. 2006.
[46] K.S. Hsiao and C.H. Chen, “Improving Scalability and Complexity of Dynamic Scheduler through Wakeup-Based Scheduling,” Proc. Int'l Conf. Computer Design, Oct. 2006.

Index Terms:
CAM-based wakeup logic, issue logic, low power, matrix-based wakeup logic, scalable instruction scheduler, wakeup spatial locality
Citation:
Chung-Ho Chen, Kuo-Su Hsiao, "Scalable Dynamic Instruction Scheduler through Wake-Up Spatial Locality," IEEE Transactions on Computers, vol. 56, no. 11, pp. 1534-1548, Nov. 2007, doi:10.1109/TC.2007.70743
Usage of this product signifies your acceptance of the Terms of Use.