Issue No. 04 - April (2008 vol. 19)
The increase in the complexity of a wide-issue processor with its pipeline width is one of the primary concerns of the processor designers. In the conventional design, hardware in the processor core is laid out to handle multiple instructions with two-source operands in each pipeline stage. However, analysis of SPEC2000 programs reveals that an integer program on average constitutes 25.2% of two-op (both source registers) integer instructions and 72.5% one-op/zero-op integer instructions. Floating-point programs (FP) are found to constitute on average 15.8% of two-op integer instructions and 44.1% one-op/zero-op integer instructions. The analysis observes that the hardware laid out for worst case requirements in the integer pipeline is highly under-utilized for a significant portion of time. To alleviate the complexity issues we propose split pipeline architecture, a novel technique to distinguish and process instructions based on their source operand requirements. The conventional pipeline is split into two after the decode stage, and the two pipelines are again converged at the execution stage. This leads to a capability of processing instructions at a higher clock rate and at almost the same IPC, as compared to a conventional processor. Various flavors of the proposed architecture are simulated and analyzed in this paper, with a circuit level analysis to determine the impact on the critical path delays. Results show that a processor that can fetch, decode, and commit eight instructions each cycle and with split pipelines of two two-source integer instruction and six zero/one-source integer instruction can achieve a clock rate that is 15.8% faster than an 8-wide conventional processor while losing the IPC throughput by only 0.7% for SPEC2000 benchmarks. Similarly, in a 4-wide processor and with split pipelines of one two-source integer instruction and three zero/one-source integer instruction can achieve a clock rate that is 19.69% faster than a 4-wide conventional processor while losing the IPC throughput by only 1.9%
R. Sangireddy and J. Shah, "Operand-Load-Based Split Pipeline Architecture for High Clock Rate and Commensurable IPC," in IEEE Transactions on Parallel & Distributed Systems, vol. 19, no. , pp. 529-544, 2007.