The Community for Technology Leaders
Green Image
<p>Trace-driven simulations of numerical Fortran programs are used to study the impact ofthe parallel loop scheduling strategy on data prefetching in a shared memorymultiprocessor with private data caches. The simulations indicate that to maximizememory performance, it is important to schedule blocks of consecutive iterations toexecute on each processor, and then to adaptively prefetch single-word cache blocks tomatch the number of iterations scheduled. Prefetching multiple single-word cache blockson a miss reduces the miss ratio by approximately 5% to 30% compared to a system withno prefetching. In addition, the proposed adaptive prefetching scheme further reducesthe miss ratio while significantly reducing the false sharing among cache blocks compared to nonadaptive prefetching strategies. Reducing the false sharing causes fewer coherence invalidations to be generated, and thereby reduces the total network traffic. The impact of the prefetching and scheduling strategies on the temporal distribution ofcoherence invalidations also is examined. It is found that invalidations tend to be evenlydistributed throughout the execution of parallel loops, but tend to be clustered whenexecuting sequential program sections. The distribution of invalidations in both types of program sections is relatively insensitive to the prefetching and scheduling strategy.</p>
Index Termsscheduling; buffer storage; shared memory systems; parallel programming; performanceevaluation; parallel loop scheduling; prefetching; shared memory multiprocessor;trace-driven simulations; numerical Fortran programs; data caches; memory performance;single-word cache blocks; cache coherence; cache pollution; false sharing; guidedself-scheduling

D. Lilja, "The Impact of Parallel Loop Scheduling Strategies on Prefetching in a Shared Memory Multiprocessor," in IEEE Transactions on Parallel & Distributed Systems, vol. 5, no. , pp. 573-584, 1994.
169 ms
(Ver 3.3 (11022016))