The Community for Technology Leaders
2006 International Conference on Parallel Architectures and Compilation Techniques (PACT) (2006)
Seattle, WA, USA
Sept. 16, 2006 to Sept. 20, 2006
ISBN: 978-1-5090-3022-4
pp: 275-284
Zhen Yang , Computer and Information Science and Engineering, University of Florida, Gainesville, FL 32611, USA
Xudong Shi , Computer and Information Science and Engineering, University of Florida, Gainesville, FL 32611, USA
Feiqi Su , Computer and Information Science and Engineering, University of Florida, Gainesville, FL 32611, USA
Jih-Kwon Peir , Computer and Information Science and Engineering, University of Florida, Gainesville, FL 32611, USA
ABSTRACT
Modern out-of-order processors with non-blocking caches exploit Memory-Level Parallelism (MLP) by overlapping cache misses in a wide instruction window. The exploitation of MLP, however, can be limited due to long-latency operations in producing the base address of a cache miss load. When the parent instruction is also a cache miss load, a serialization of the two loads must be enforced to satisfy the load-load data dependence. In this paper, we propose a mechanism that dynamically captures the load-load data dependences at runtime. A special Preload is issued in place of the dependent load without waiting for the parent load, thus effectively overlapping the two loads. The Preload provides necessary information for the memory controller to calculate the correct memory address upon the availability of the parent's data to eliminate any interconnect delay between the two loads. Performance evaluations based on SPEC2000 and Olden applications show that significant speedups up to 40% with an average of 16% are achievable using the Preload. In conjunction with other aggressive MLP exploitation methods, such as runahead execution, the Preload can make more significant improvement with an average of 22%.
INDEX TERMS
Memory-Level Parallelism, Data Prefetching, Pointer-Chasing Loads, Instruction and Issue Window
CITATION
Zhen Yang, Xudong Shi, Feiqi Su, Jih-Kwon Peir, "Overlapping dependent loads with addressless preload", 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT), vol. 00, no. , pp. 275-284, 2006, doi:
87 ms
(Ver 3.3 (11022016))