Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques (2007)
Sept. 15, 2007 to Sept. 19, 2007
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/PACT.2007.53
Katherine Yelick , University of California at Berkeley, USA
Costin Iancu , Lawrence Berkeley National Laboratory, USA
Wei Chen , University of California at Berkeley, USA
As high end computing systems continue to scale in CPU computational power and overall node count, optimization techniques that can reduce communication overhead have proven important. Communication optimizations have been explored in the context of parallelizing compilers and data parallel languages. Most of these studies have traditionally been performed using MPI as the communication library and at a time when networks had a relatively high latency and low bandwidth. As a result, most techniques concentrate on eliminating redundant messages and reducing message count through aggregation. Research on recent networks has shown that significant performance improvements can be achieved using fine grained communication decomposition and overlap.
Katherine Yelick, Costin Iancu, Wei Chen, "Performance Portable Optimizations for Loops Containing Communication Operations", Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, vol. 00, no. , pp. 411, 2007, doi:10.1109/PACT.2007.53