2013 IEEE International Conference on Cluster Computing (CLUSTER) (2013)
Indianapolis, IN, USA
Sept. 23, 2013 to Sept. 27, 2013
Lena Oden , Fraunhofer Institute & University of Heidelberg, Germany
Holger Froning , University of Heidelberg, Germany
Modern GPUs are powerful high-core-count processors, which are no longer used solely for graphics applications, but are also employed to accelerate computationally intensive general-purpose tasks. For utmost performance, GPUs are distributed throughout the cluster to process parallel programs. In fact, many recent high-performance systems in the TOP500 list are heterogeneous architectures. Despite being highly effective processing units, GPUs on different hosts are incapable of communicating without assistance from a CPU. As a result, communication between distributed GPUs suffers from unnecessary overhead, introduced by switching control flow from GPUs to CPUs and vice versa. Most communication libraries even require intermediate copies from GPU memory to host memory. This overhead in particular penalizes small data movements and synchronization operations, reduces efficiency and limits scalability. In this work we introduce global address spaces to facilitate direct communication between distributed GPUs without CPU involvement. Avoiding context switches and unnecessary copying dramatically reduces communication overhead. We evaluate our approach using a variety of workloads including low-level latency and bandwidth benchmarks, basic synchronization primitives like barriers, and a stencil computation as an example application. We see performance benefits of up to 2× for basic benchmarks and up to 1.67× for stencil computations.
bulk-synchronous execution, parallel processing, hybrid computing clusters, GPU communication
L. Oden and H. Froning, "GGAS: Global GPU address spaces for efficient communication in heterogeneous clusters," 2013 IEEE International Conference on Cluster Computing (CLUSTER), Indianapolis, IN, USA USA, 2014, pp. 1-8.