2016 IEEE International Conference on Cluster Computing (CLUSTER) (2016)
Sept. 12, 2016 to Sept. 16, 2016
Remote GPU execution has been proven to increase GPU occupancy and reduce job waiting time in multi-GPU batch-queue systems, by allowing jobs to utilize remote GPUs when there are not enough unoccupied local GPUs available. However, for GPU communication intensive applications, remote GPU communication overhead may account for more than 70% of the applications' execution times. The need for using a remote GPU exists when there are not enough local GPUs available on a node assigned to the job, but a local GPU could become available afterward. We propose mrCUDA, a middleware for migrating execution on a remote GPU to a local GPU on-demand. Our evaluation shows that for long-running jobs mrCUDA overhead accounts for less than 1% of their total execution times. In addition, by applying mrCUDA to the first-come-first-serve (FCFS) job scheduling algorithm, we could reduce job lifetimes (waiting + execution times) as much as 30% on average without changing the scheduling policy.
Graphics processing units, Mathematical model, Middleware, Scheduling algorithms, Runtime, Bandwidth, Libraries
P. Markthub, A. Nomura and S. Matsuoka, "Serving More GPU Jobs, with Low Penalty, Using Remote GPU Execution and Migration," 2016 IEEE International Conference on Cluster Computing (CLUSTER), Taipei, Taiwan, 2016, pp. 485-488.