The Community for Technology Leaders
2011 International Conference on Parallel Architectures and Compilation Techniques (2011)
Galveston, Texas USA
Oct. 10, 2011 to Oct. 14, 2011
ISSN: 1089-795X
ISBN: 978-0-7695-4566-0
pp: 216
Today, almost all desktop and laptop computers are shared-memory multicores, but the code they run is overwhelmingly serial. High level language extensions and libraries (e.g., Open MP, Cilk++, TBB) make it much easier for programmers to write parallel code than previous approaches (e.g., MPI), in large part thanks to the efficient {\em work-stealing} scheduler that allows the programmer to expose more parallelism than the actual hardware parallelism. But when the parallel tasks are too short or too many, the scheduling overheads become significant and hurt performance. Because this happens frequently (e.g, data-parallelism, PRAM algorithms), programmers need to manually coarsen tasks for performance by combining many of them into longer tasks. But manual coarsening typically causes over fitting of the code to the input data, platform and context used to do the coarsening, and harms performance-portability. We propose distinguishing between two types of coarsening and using different techniques for them. Then improve on our previous work on Lazy Binary Splitting (LBS), a scheduler that performs the second type of coarsening dynamically, but fails to scale on large commercial multicores. Our improved scheduler, Breadth-First Lazy Scheduling (BF-LS) overcomes the scalability issue of LBS and performs much better on large machines.
parallel programming, run-time scheduling, lazy work-stealing, productivity

R. Barua, U. Vishkin and A. Tzannes, "Improving Run-Time Scheduling for General-Purpose Parallel Code," 2011 International Conference on Parallel Architectures and Compilation Techniques(PACT), Galveston, Texas USA, 2011, pp. 216.
171 ms
(Ver 3.3 (11022016))