The Community for Technology Leaders
Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques (2002)
Charlottesville, Virginia
Sept. 22, 2002 to Sept. 25, 2002
ISSN: 1089-795X
ISBN: 0-7695-1620-3
pp: 189
Daniel Ortega , Universidad Politécnica de Cataluña
Eduard Ayguadé , Universidad Politécnica de Cataluña
Jean-Loup Baer , University of Washington
Mateo Valero , Universidad Politécnica de Cataluña
ABSTRACT
Ever increasing memory latencies and deeper pipelines push memory farther from the processor. Prefetching techniques aim is to bridge these two gaps by fetching data in advance to both the L1 cache and the register file. Our main contribution in this paper is a hybrid approach to the prefetching problem that combines both software and hardware prefetching in a cost-effective way by needing very little hardware support and impacting minimally the design of the processor pipeline. The prefetcher is built on-top of a static memory instruction bypassing, which is in charge of bringing prefetched values in the register file. In this paper we also present a thorough analysis of the limits of both prefetching and memory instruction bypassing. We also compare our prefetching technique with a prior speculative proposal that attacked the same problem, and we show that at much lower cost, our hybrid solution is better than a realistic implementation of speculative prefetching and bypassing. In average, our hybrid implementation achieves a 13% speed-up improvement over a version with software prefetching in a subset of numerical applications and an average of 43% over a version with no software prefetching (achieving up to a 102% for specific benchmarks).
INDEX TERMS
null
CITATION
Daniel Ortega, Eduard Ayguadé, Jean-Loup Baer, Mateo Valero, "Cost-Effective Compiler Directed Memory Prefetching and Bypassing", Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, vol. 00, no. , pp. 189, 2002, doi:10.1109/PACT.2002.1106017
84 ms
(Ver 3.3 (11022016))