Issue No.03 - March (2001 vol.12)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/71.914756
<p><b>Abstract</b>—Wavefront parallelism, in which parallelism is limited to hyperplanes in an iteration space, can arise when compilers apply <it>tiling</it> to loop nests to enhance locality. Previous approaches for scheduling wavefront parallelism focused on maximizing parallelism, balancing workloads, and reducing synchronization. In this paper, we show that on large-scale shared-memory multiprocessors, locality is a crucial factor. We make the distinction between <it>intratile</it> and <it>intertile</it> locality and show that as the number of processors grows, intertile locality becomes more important. We consider and experimentally evaluate existing strategies for scheduling wavefront parallelism. We show that dynamic self-scheduling can be efficiently used on a small number of processors, but performs poorly at large scale because it does not enhance intertile locality. By contrast, static scheduling strategies enhance intertile locality for small tiles, maintaining parallelism and resulting in better performance at large scale. Results from a Convex SPP1000 multiprocessor demonstrate the importance of taking intertile locality into account. Static scheduling outperforms dynamic self-scheduling by a factor of up to 2.3 on 30 processors.</p>
High-performance compilers, wavefront parallelism, cache locality, locality-enhancing loop transformations, tiling, large-scale shared-memory multiprocessors.
Naraig Manjikian, Tarek S. Abdelrahman, "Exploiting Wavefront Parallelism on Large-Scale Shared-Memory Multiprocessors", IEEE Transactions on Parallel & Distributed Systems, vol.12, no. 3, pp. 259-271, March 2001, doi:10.1109/71.914756