Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques (2005)
St. Louis, Missouri
Sept. 17, 2005 to Sept. 21, 2005
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/PACT.2005.23
Ilya Ganusov , Computer Systems Laboratory Cornell University
Martin Burtscher , Computer Systems Laboratory Cornell University
<p>This paper proposes a new hardware technique for using one core of a CMP to prefetch data for a thread running on another core. Our approach simply executes a copy of all non-control instructions in the prefetching core after they have executed in the primary core. On the way to the second core, each instruction?s output is replaced by a prediction of the likely output that the nth future instance of this instruction will produce. Speculatively executing the resulting instruction stream on the second core issues load requests that the main program will probably reference in the future. Unlike previously proposed thread-based prefetching approaches, our technique does not need any thread spawning points, features an adjustable lookahead distance, does not require complicated analyzers to extract prefetching threads, is recovery-free, and necessitates no storage for the prefetching threads. We demonstrate that for the SPECcpu2000 benchmark suite, our mechanism significantly increases the prefetching coverage and improves the primary core?s performance by 10% on average over a baseline that already includes an aggressive hardware stream prefetcher. We further show that our approach works well in combination with runahead execution.</p>
I. Ganusov and M. Burtscher, "Future Execution: A Hardware Prefetching Technique for Chip Multiprocessors," PACT 2005. 14th International Conference on Parallel Architectures and Compilation Techniques(PACT), St. Louis, MO, USA, 2005, pp. 350-360.