2016 International Conference on Parallel Architecture and Compilation Techniques (PACT) (2016)
Sept. 11, 2016 to Sept. 15, 2016
Saumay Dublish , University of Edinburgh, United Kingdom
Due to lack of sufficient compute threads in memory-intensive applications, GPUs often exhaust all the active warps and therefore, the memory latencies get exposed and appear in the critical path. In such a scenario, the shared on-chip and off-chip memory bandwidth appear more performance critical to cores with few or no active warps, in contrast to cores with sufficient active warps. In this work, we use the slack of memory responses as a metric to identify the criticality of shared bandwidth to different cores. Consequently, we propose a slack-aware DRAM scheduling policy to prioritize requests from cores with negative slack, ahead of row-buffer hits. We also propose a request throttling mechanism to reduce the shared bandwidth demand of cores that have enough active warps to sustain execution. The above techniques help in reducing the memory latencies that appear in the critical path by increasing the memory latencies that can be hidden by multithreading.
Bandwidth, Memory management, Random access memory, System-on-chip, Measurement, Graphics processing units, Discrete wavelet transforms
S. Dublish, "Student research poster: Slack-aware shared bandwidth management in GPUs," 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT), Haifa, Israel, 2016, pp. 451-452.