The Community for Technology Leaders
Green Image
In a high-performance superscalar processor, the instruction scheduler often comes with poor scalability and high complexity due to the expensive wakeup operation. From detailed simulation-based analyses, we find that 95% of the wakeup distances between two dependent instructions are short, in the range of 16 instructions, and 99% are in the range of 31 instructions. We apply this wakeup spatial locality to the design of conventional CAM-based and matrix-based wakeup logic respectively. By limiting the wakeup coverage to i + 16 instructions where 0 =< i =< 15 for 16-entry segments, the proposed wakeup designs confine the wakeup operation in two matrix-based or three CAM-based 16-entry segments no matter how large the issue window size is. The experimental results show that for an issue window of 128 entries (IW128) or 256 entries (IW256), the proposed CAM-based wakeup locality design saves 65% (IW128) and 76% (IW256) of the power consumption, reduces 44% (IW128) and 78% (IW256) in the wakeup latency compared to the conventional CAM-based design with almost no performance loss. For the matrix-based wakeup logic, applying wakeup locality to the design drastically reduces the area cost. Extensive simulation results, including comparisons with previous works, show that the wakeup spatial locality is the key element to achieve scalability for future sophisticated instruction schedulers.
CAM-based wakeup logic, issue logic, low power, matrix-based wakeup logic, scalable instruction scheduler, wakeup spatial locality

K. Hsiao and C. Chen, "Scalable Dynamic Instruction Scheduler through Wake-Up Spatial Locality," in IEEE Transactions on Computers, vol. 56, no. , pp. 1534-1548, 2007.
94 ms
(Ver 3.3 (11022016))