Issue No. 05 - May (1993 vol. 42)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/12.223672
<p> Parallel processing systems with cache or local memory in the memory hierarchies are considered. These systems have a local cache memory in each processor and usually employ a write-invalidate protocol for the cache coherence. In such systems, a problem called 'cache or local memory thrashing' can arise in executions of parallel programs, when the data unnecessarily moves back and forth between the caches or local memories in different processors. An approach to eliminate, or at least to reduce, such movement for nested parallel loops is presented. It is based on relations between array element accesses and enclosed loop indexes in the loops. The relations can be used to assign processors to execute the appropriate iterations for parallel loops in the loop nests with respect to the data in their caches or local memories. An algorithm for calculating the correct iteration of the parallel loop in terms of loop indexes of the previous iterations executed in the processor is presented. This method benefits parallel code with nested loop structures in a wide range of applications. The experimental results show that the technique can achieve speedups up to 2.</p>
iteration partition approach; memory hierarchies; local cache memory; write-invalidate protocol; cache coherence; local memory thrashing; parallel programs; nested parallel loops; array element accesses; enclosed loop indexes; parallel loops; loop nests; local memories; correct iteration; parallel code; nested loop structures; iterative methods; memory architecture; parallel programming; storage management.
J. Fang and M. Lu, "An Iteration Partition Approach for Cache or Local Memory Thrashing on Parallel Processing," in IEEE Transactions on Computers, vol. 42, no. , pp. 529-546, 1993.