|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
| ASCII Text | x | ||
| Adrian Cristal, Daniel Ortega, Josep Llosa, Mateo Valero, "Out-of-Order Commit Processors," High-Performance Computer Architecture, International Symposium on, pp. 48, 10th International Symposium on High Performance Computer Architecture (HPCA'04), 2004. | |||
| BibTex | x | ||
| @article{ 10.1109/HPCA.2004.10008, author = {Adrian Cristal and Daniel Ortega and Josep Llosa and Mateo Valero}, title = {Out-of-Order Commit Processors}, journal ={High-Performance Computer Architecture, International Symposium on}, volume = {0}, year = {2004}, issn = {1530-0897}, pages = {48}, doi = {http://doi.ieeecomputersociety.org/10.1109/HPCA.2004.10008}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - CONF JO - High-Performance Computer Architecture, International Symposium on TI - Out-of-Order Commit Processors SN - 1530-0897 SP EP A1 - Adrian Cristal, A1 - Daniel Ortega, A1 - Josep Llosa, A1 - Mateo Valero, PY - 2004 KW - null VL - 0 JA - High-Performance Computer Architecture, International Symposium on ER - | |||
Modern out-of-order processors tolerate long latency memory operations by supporting a large number of in-flight instructions. This is particularly useful in numerical applications where branch speculation is normally not a problem and where the cache hierarchy is not capable of delivering the data soon enough. In order to support more in-flight instructions, several resources have to be up-sized, such as the Reorder Buffer (ROB), the general purpose instructions queues, the Load/Store queue and the number of physical registers in the processor. However, scaling-up the number of entries in these resources is impractical because of area, cycle time, and power consumption constraints.
In this paper we propose to increase the capacity of future processors by augmenting the number of in-flight instructions. Instead of simply up-sizing resources, we push for new and novel microarchitectural structures that achieve the same performance benefits but with a much lower need for resources. Our main contribution is a new checkpointing mechanism that is capable of keeping thousands of in-flight instructions at a practically constant cost. We also propose a queuing mechanism that takes advantage of the differences in waiting time of the instructions in the flow.
Using these two mechanisms our processor has a performance degradation of only 10% for SPEC2000fp over a conventional processor requiring more than an order of magnitude additional entries in the ROB and instruction queues, and about a 200% improvement over a current processor with a similar number of entries.
