SPMD(Single Program Multiple Data) models [9, 3] and other traditional models of data parallelism provide parallelism at the processor-level. Barrier synchroniza- tion is defined at the level of processors where, when a processor arrives at the barrier point early and waits for others to arrive, no other useful work is done on that processor. Program restructuring is one way of min- imizing such latencies. However, such programs tend to be error-prone and less portable. In this paper we discuss how multithreading can be used in data paral- lelism to mask delays due to application irregularity or processor load imbalance. The discussion is in the con- text of Coir [14, 16, 17], our object-oriented runtime system for parallelism. The discussion concentrates on shared memory systems. The sample application is an LU factorization algorithm for skyline sparse matrices. We discuss performance results on the IBM PowerPC- based symmetric multiprocessor system.
Citation:
Neelakantan Sundaresan, "Exploiting Delayed Synchronization Arrivals in Light-Weight Data Parallelism," hicss, vol. 1, pp.606, 30th Hawaii International Conference on System Sciences (HICSS) Volume 1: Software Technology and Architecture, 1997