Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques (2007)
Sept. 15, 2007 to Sept. 19, 2007
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/PACT.2007.10
Rajesh Vivekanandharn , Indian Institute of Science, India
R. Govindarajan , Indian Institute of Science, India
Out-of-order superscalar processors require the ability to issue loads while older stores are in-flight. Forcing loads to wait for all older stores, including those on which they may not be dependent on, to retire and write to the cache would reduce IPC and take away almost all the benefit of out-of-order execution. On the other hand, maintaining functional correctness while allowing loads to execute in the presence of stores in-flight requires the ability to forward data from the most recent older inflight store to the same address. Such forwarding typically involves a CAM match of the 64 bit physical address field of each store queue entry. The store queue data forwarding logic is thus a significantly high-latency circuit and could limit the frequency of the design .
Rajesh Vivekanandharn, R. Govindarajan, "A Scalable Low Power Store Queue for Large InstructionWindow Processors", Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, vol. 00, no. , pp. 430, 2007, doi:10.1109/PACT.2007.10