12th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems (MASCOTS'04) Caching Values in the Load Store Queue Volendam, The Netherlands October 04-October 08 ISBN: 0-7695-2251-3
The latency of an L1 data cache continues to grow with increasing clock frequency, cache size and associativity. The increased latency is an important source of performance loss in high-performance processors. This paper proposes to cache data utilizing the Load-Store Queue (LSQ) hardware and data paths. Using very little additional hardware this inexpensive cache improves performance and reduces energy consumption. The modified Load/Store Queue "caches" all previously accessed data values going beyond existing store-to-load forwarding techniques. Both load and store data are placed in the LSQ and is retained there after a corresponding memory access instruction has been committed. It is shown that a 128-entry modified LSQ design allows an average of 51% of all loads in the SpecINT2000 benchmarks to get their data from the LSQ. Up to 7% performance improvement is achieved on SPECInt2000 with a 1-cycle LSQ access latency and 3-cycle L1 cache latency. The average speedup is over 4%.
Citation:
Dan Nicolaescu, Alex Veidenbaum, Alex Nicolau, "Caching Values in the Load Store Queue," mascots, pp.580-587, 12th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems (MASCOTS'04), 2004 Usage of this product signifies your acceptance of the Terms of Use. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||