Parallel and Distributed Processing Symposium, International (2010)
Atlanta, GA, USA
Apr. 19, 2010 to Apr. 23, 2010
Long Chen , Department of Electrical&Computer Engineering, University of Delaware, Newark, DE 19716
Oreste Villa , High Performance Computing, Pacific Northwest National Laboratory, Richland, WA 99352
Sriram Krishnamoorthy , High Performance Computing, Pacific Northwest National Laboratory, Richland, WA 99352
Guang R. Gao , Department of Electrical&Computer Engineering, University of Delaware, Newark, DE 19716
The computational power provided by many-core graphics processing units (GPUs) has been exploited in many applications. The programming techniques currently employed on these GPUs are not sufficient to address problems exhibiting irregular, and unbalanced workload. The problem is exacerbated when trying to effectively exploit multiple GPUs concurrently, which are commonly available in many modern systems. In this paper, we propose a task-based dynamic load-balancing solution for single-and multi-GPU systems. The solution allows load balancing at a finer granularity than what is supported in current GPU programming APIs, such as NVIDIA's CUDA. We evaluate our approach using both micro-benchmarks and a molecular dynamics application that exhibits significant load imbalance. Experimental results with a single-GPU configuration show that our fine-grained task solution can utilize the hardware more efficiently than the CUDA scheduler for unbalanced workload. On multi-GPU systems, our solution achieves near-linear speedup, load balance, and significant performance improvement over techniques based on standard CUDA APIs.
O. Villa, S. Krishnamoorthy, L. Chen and G. R. Gao, "Dynamic load balancing on single- and multi-GPU systems," 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), Atlanta, GA, 2010, pp. 1-12.