Proceedings.International Conference on Parallel Architectures and Compilation Techniques (2002)
Sept. 22, 2002 to Sept. 25, 2002
Daniel Ortega , Universidad Politécnica de Cataluña
Eduard Ayguadé , Universidad Politécnica de Cataluña
Jean-Loup Baer , University of Washington
Mateo Valero , Universidad Politécnica de Cataluña
Ever increasing memory latencies and deeper pipelines push memory farther from the processor. Prefetching techniques aim is to bridge these two gaps by fetching data in advance to both the L1 cache and the register file. Our main contribution in this paper is a hybrid approach to the prefetching problem that combines both software and hardware prefetching in a cost-effective way by needing very little hardware support and impacting minimally the design of the processor pipeline. The prefetcher is built on-top of a static memory instruction bypassing, which is in charge of bringing prefetched values in the register file. In this paper we also present a thorough analysis of the limits of both prefetching and memory instruction bypassing. We also compare our prefetching technique with a prior speculative proposal that attacked the same problem, and we show that at much lower cost, our hybrid solution is better than a realistic implementation of speculative prefetching and bypassing. In average, our hybrid implementation achieves a 13% speed-up improvement over a version with software prefetching in a subset of numerical applications and an average of 43% over a version with no software prefetching (achieving up to a 102% for specific benchmarks).
J. Baer, M. Valero, D. Ortega and E. Ayguadé, "Cost-Effective Compiler Directed Memory Prefetching and Bypassing," Proceedings.International Conference on Parallel Architectures and Compilation Techniques(PACT), Charlottesville, Virginia, 2002, pp. 189.