High performance and alleviated hot-spot problem in processor frontend with enhanced instruction fetch bandwidth utilization
2006 IEEE International Performance Computing and Communications Conference (2006)
Phoenix, AZ, USA
Apr. 10, 2006 to Apr. 12, 2006
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/.2006.1629391
P. Rajamani , Dept. of Electr. Eng., Texas Univ., Dallas, TX, USA
J.P. Shah , Dept. of Electr. Eng., Texas Univ., Dallas, TX, USA
Current day wide-issue processors require the fetch engine in the frontend to continuously supply instructions to the issue queue in the backend to extract maximum possible amount of instruction level parallelism (ILP). Further, due to the continuous access of level-1 instruction cache (IL1) for fetching instructions, the power dissipation due to switching activity in IL1 is overwhelmingly high and continuous, and hence IL1 is one of the prominent hot-spots on the chip. In this paper, we alleviate the effect of control dependencies whenever a branch instruction forms a small loop. We use replicator architecture, a novel mechanism to supply twice the number of loop instructions in the same cycle, leading to a faster supply of instructions to the backend. This leads to an improvement in processor throughput in terms of instructions committed per cycle (IPC) due to extraction of higher ILP. Further, the mechanism results in a significant reduction in the total number of IL1 accesses. Implementation of the proposed technique in an 8-wide out-of-order issue processor results in a 19% improvement in IPC, and a 8.5% reduction in overall energy consumption on average, for various processor evaluation benchmark programs. Further, an enhanced Replicator mechanism results in larger reduction in IL1 accesses, leading to a 16% reduction in the overall energy consumed. The enhanced architecture removes the continuity in the access to the IL1 by feeding the instructions to the backend all by itself whenever a loop occurs. This gives a break to switching activity in IL1 and hence mitigates the hot-spot problem in the frontend of the processor.
bandwidth utilization, fetch engine, frontend processor, backend queue, instruction level parallelism, ILP, level-1 instruction cache, IL1, switching activity, branch instruction, replicator architecture, energy consumption, processor evaluation, benchmark program, hot-spot problem
P. Rajamani, J. Shah, V. Sankaranarayanan and R. Sangireddy, "High performance and alleviated hot-spot problem in processor frontend with enhanced instruction fetch bandwidth utilization," 2006 IEEE International Performance Computing and Communications Conference(PCC), Phoenix, AZ, USA, 2006, pp. 13.