The Community for Technology Leaders
2017 IEEE International Symposium on High Performance Computer Architecture (HPCA) (2017)
Austin, Texas, USA
Feb. 4, 2017 to Feb. 8, 2017
ISSN: 2378-203X
ISBN: 978-1-5090-4985-1
pp: 1-12
Accelerating Convolutional Neural Networks (CNNs) on GPUs usually involves two stages: training and inference. Traditionally, this two-stage process is deployed on high-end GPU-equipped servers. Driven by the increase in compute power of desktop and mobile GPUs, there is growing interest in performing inference on various kinds of platforms. In contrast to the requirements of high throughput and accuracy during the training stage, end-users will face diverse requirements related to inference tasks. To address this emerging trend and new requirements, we propose Pervasive CNN (P-CNN), a user satisfaction-aware CNN inference framework. P-CNN is composed of two phases: cross-platform offline compilation and run-time management. Based on users' requirements, offline compilation generates the optimal kernel using architecture-independent techniques, such as adaptive batch size selection and coordinated fine-tuning. The runtime management phase consists of accuracy tuning, execution, and calibration. First, accuracy tuning dynamically identifies the fastest kernels with acceptable accuracy. Next, the run-time kernel scheduler partitions the optimal computing resource for each layer and schedules the GPU thread blocks. If its accuracy is not acceptable to the end-user, the calibration stage selects a slower but more precise kernel to improve the accuracy. Finally, we design a user satisfaction metric for CNNs to evaluate our Pervasive deign. Our evaluation results show P-CNN can provide the best user satisfaction for different inference tasks.
Feature extraction, Frequency modulation, Real-time systems, Computer architecture, Convolution, Support vector machines, Entropy

M. Song, Y. Hu, H. Chen and T. Li, "Towards Pervasive and User Satisfactory CNN across GPU Microarchitectures," 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA), Austin, Texas, USA, 2017, pp. 1-12.
444 ms
(Ver 3.3 (11022016))