Scalable Load and Store Processing in Latency-Tolerant Processors
January/February 2006 (vol. 26 no. 1)
pp. 30-39
New load and store processing algorithms let memory-latency-tolerant architectures sustain thousands of in-flight instructions without scaling cycle-critical fully-associative load and store queues. These algorithms rely on redoing some stores after fetching cache miss data from memory (to fix memory dependences). Doing so provides better power and area characteristics than constantly enforcing memory dependences among a several loads and stores, many of which have unknown addresses.
Index Terms:
Latency-tolerant processors, load and store, CAM
Citation:
Amit Gandhi, Haitham Akkary, Ravi Rajwar, Srikanth T. Srinivasan, Konrad Lai, "Scalable Load and Store Processing in Latency-Tolerant Processors," IEEE Micro, vol. 26, no. 1, pp. 30-39, Jan./Feb. 2006, doi:10.1109/MM.2006.21