Proceedings 1997 International Conference on Parallel Architectures and Compilation Techniques (1997)
San Francisco, CA
Nov. 11, 1997 to Nov. 15, 1997
J. Skeppstedt , Dept. of Comput. Eng., Chalmers Univ. of Technol., Goteborg, Sweden
In this paper we first identify limitations of compiler-controlled prefetching in a CC-NUMA multiprocessor with a write-invalidate cache coherence protocol. Compiler-controlled prefetch techniques for CC-NUMAs often are focused only, on stride-accesses, and this introduces a major limitation. We consider combining prefetch with two other compiler-controlled techniques to partly remedy the situation: (1) load-exclusive to reduce write-latency and (2) store-update to reduce read-latency. The purpose of each of these techniques in a machine with prefetch is to let them reduce latency for accesses which the prefetch technique could not handle. We evaluate two different scenarios, firstly with a hybrid compiler/hardware prefetch technique and secondly with an optimal stride-prefetcher. We find that the combined gains under the hybrid prefetch technique are significant for six applications we have studied: in average, 71% of the original write-stall time remains after using the hybrid prefetcher, and of these ownership-requests, 60% would be eliminated using load-exclusive; in average, 68% of the read-stall time remains after using the hybrid prefetcher and of these read-misses, 34% were serviced by remote caches and would be converted by store-update into misses serviced by a clean copy in memory which reduces the read-latency. With an optimal stride-prefetcher our results show that it beneficient to complement prefetch, with the two techniques here as well.
parallel architectures; prefetching; multiprocessors; compiler-initiated coherence; CC-NUMA multiprocessor; compiler-controlled prefetching; prefetch; read-stall time; write-latency; read-latency; memory access latency reduction; compiler-analysis; migratory sharing
J. Skeppstedt, "Overcoming Limitations Of Prefetching In Multiprocessors By Compiler-Initiated Coherence Action," Proceedings 1997 International Conference on Parallel Architectures and Compilation Techniques(PACT), San Francisco, CA, 1997, pp. 272.