Parallel and Distributed Processing Symposium, International (2001)
San Francisco, California, USA
Apr. 23, 2001 to Apr. 27, 2001
A large class of scientific applications are comprised of irregular reductions on large data sets. On shared-memory multiprocessors these reductions are typically parallelized by computing partial results into replicated buffers, then combining the values into shared data using synchronization. Recently, a number of alternative techniques have been developed based on selective privatization, local writes, and synchronized writes. In this paper, we present a more efficient version of the local write algorithm which is 56%faster on average .We then experimentally compare the performance of each technique using a number of representative kernels. Results show speedups vary greatly depending on application characteristics such as connectivity, local city, and adaptivity. In genera, we find the local write technique provides the best performance, particularly when applications display good locality.
H. Han and C. Tseng, "A Comparison of Parallelization Techniques for Irregu ar Reductions," Parallel and Distributed Processing Symposium, International(IPDPS), San Francisco, California, USA, 2001, pp. 10027.