2006 International Conference on Parallel Architectures and Compilation Techniques (PACT) (2006)
Seattle, WA, USA
Sept. 16, 2006 to Sept. 20, 2006
DOI Bookmark: http://doi.ieeecomputersociety.org/
Ilya Ganusov , Computer Systems Laboratory, Cornell University, Ithaca, New York
Martin Burtscher , Computer Systems Laboratory, Cornell University, Ithaca, New York
The advance of multi-core architectures provides significant benefits for parallel and throughput-oriented computing, but the performance of individual computation threads does not improve and may even suffer a penalty because of the increased contention for shared resources. This paper explores the idea of using available general-purpose cores in a CMP as helper engines for individual threads running on the active cores. We propose a lightweight architectural framework for efficient event-driven software emulation of complex hardware accelerators and describe how this framework can be applied to implement a variety of prefetching techniques. We demonstrate the viability and effectiveness of our framework on a wide range of applications from the SPEC CPU2000 and Olden benchmark suites. On average, our mechanism provides performance benefits within 5% of pure hardware implementations. Furthermore, we demonstrate that running event-driven prefetching threads on top of a baseline with a hardware stride prefetcher yields significant speedups for many programs. Finally, we show that our approach provides competitive performance improvements over other hardware approaches for multi-core execution while executing fewer instructions and requiring considerably less hardware support.
multi-core architectures, prefetching, helper threading
I. Ganusov and M. Burtscher, "Efficient emulation of hardware prefetchers via event-driven helper threading," 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT), Seattle, WA, USA, 2006, pp. 144-153.