The Community for Technology Leaders
2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT) (2012)
Minneapolis, MN, USA
Sept. 19, 2012 to Sept. 23, 2012
ISBN: 978-1-5090-6609-4
pp: 325-334
Vijay Sathish , The University of Wisconsin-Madison, U.S.A.
Michael J. Schulte , Advanced Micro Devices, TX, U.S.A.
Nam Sung Kim , The University of Wisconsin-Madison, U.S.A.
ABSTRACT
State-of-the-art graphic processing units (GPUs) provide very high memory bandwidth, but the performance of many general-purpose GPU (GPGPU) workloads is still bounded by memory bandwidth. Although compression techniques have been adopted by commercial GPUs, they are only used for compressing texture and color data, not data for GPGPU workloads. Furthermore, the microarchitectural details of GPU compression are proprietary and its performance benefits have not been previously published. In this paper, we first investigate required microarchitectural changes to support lossless compression techniques for data transferred between the GPU and its off-chip memory to provide higher effective bandwidth. Second, by exploiting some characteristics of floating-point numbers in many GPGPU workloads, we propose to apply lossless compression to floating-point numbers after truncating their least-significant bits (i.e., lossy compression). This can reduce the bandwidth usage even further with very little impact on overall computational accuracy. Finally, we demonstrate that a GPU with our lossless and lossy compression techniques can improve the performance of memory-bound GPGPU workloads by 26% and 41% on average.
INDEX TERMS
Graphics processing units, Random access memory, Bandwidth, Instruction sets, Metadata, Microarchitecture, Memory management
CITATION
Vijay Sathish, Michael J. Schulte, Nam Sung Kim, "Lossless and lossy memory I/O link compression for improving performance of GPGPU workloads", 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT), vol. 00, no. , pp. 325-334, 2012, doi:
88 ms
(Ver 3.3 (11022016))