The Community for Technology Leaders
2014 43nd International Conference on Parallel Processing Workshops (ICCPW) (2014)
Minneapolis, MN, USA
Sept. 9, 2014 to Sept. 12, 2014
ISSN: 1530-2016
ISBN: 978-1-4799-5615-9
pp: 411-418
In High-Performance Computing (HPC), GPU-based accelerators are pervasive for two reasons: first, GPUs provide a much higher raw computational power than traditional CPUs. Second, power consumption increases sub-linearly with the performance increase, making GPUs much more energy-efficient in terms of GFLOPS/Watt than CPUs. Although these advantages are limited to a selected set of workloads, most HPC applications can benefit a lot from GPUs. The top 11 entries of the current Green500 list (November 2013) are all GPU-accelerated systems, which supports the previous statements. For system architects the use of GPUs is challenging though, as their architecture is based on thread-collaborative execution and differs significantly from CPUs, which are mainly optimized for single-thread performance. The interfaces to other devices in a system, in particular the network device, are still solely optimized for CPUs. This makes GPU-controlled IO a challenge, although it is desirable for savings in terms of energy and time. This is especially true for network devices, which are a key component in HPC systems. In previous work we have shown that GPUs can directly source and sink network traffic for Infiniband devices without any involvement of the host CPUs, but this approach does not provide any performance benefits. Here we explore another API for Put/Get operations that can overcome some limitations. In particular, we provide a detailed reasoning about the issues that prevent performance advantages when directly controlling IO from the GPU domain.
Graphics processing units, Data transfer, Performance evaluation, Bandwidth, Instruction sets, Kernel, Programming

B. Klenk, L. Oden and H. Froening, "Analyzing Put/Get APIs for Thread-Collaborative Processors," 2014 43nd International Conference on Parallel Processing Workshops (ICCPW)(ICPPW), Minneapolis, MN, USA, 2014, pp. 411-418.
97 ms
(Ver 3.3 (11022016))