16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007) (2007)
Sept. 15, 2007 to Sept. 19, 2007
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/PACT.2007.10
Rajesh Vivekanandharn , Indian Institute of Science, India
R. Govindarajan , Indian Institute of Science, India
Out-of-order superscalar processors require the ability to issue loads while older stores are in-flight. Forcing loads to wait for all older stores, including those on which they may not be dependent on, to retire and write to the cache would reduce IPC and take away almost all the benefit of out-of-order execution. On the other hand, maintaining functional correctness while allowing loads to execute in the presence of stores in-flight requires the ability to forward data from the most recent older inflight store to the same address. Such forwarding typically involves a CAM match of the 64 bit physical address field of each store queue entry. The store queue data forwarding logic is thus a significantly high-latency circuit and could limit the frequency of the design .
R. Vivekanandharn and R. Govindarajan, "A Scalable Low Power Store Queue for Large InstructionWindow Processors," 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007)(PACT), Brasov, Romania, 2007, pp. 430.