2016 2nd Workshop on Machine Learning in HPC Environments (MLHPC) (2016)
Salt Lake City, Utah, USA
Nov. 14, 2016 to Nov. 14, 2016
We study data-parallel training of deep neural networks on high-performance computing infrastructure. The key problem with scaling data-parallel training is avoiding severe communication/computation imbalance. We explore quantizing gradient updates before communication to reduce bandwidth requirements and compare it against a baseline implementation that uses the MPI allreduce routine. We port two existing quantization approaches, one-bit and threshold, and develop our own adaptive quantization algorithm. The performance of these algorithms is evaluated and compared with MPI_Allreduce when training models for the MNIST dataset and on a synthetic benchmark. On an HPC system, MPI_Allreduce outperforms the existing quantization approaches. Our adaptive quantization is comparable or superior for large layers without sacrificing accuracy. It is 1.76 times faster than the next best approach for the largest layers in our benchmark and achieves near-linear speedup in data-parallel training.
N. Dryden, T. Moon, S. A. Jacobs and B. V. Essen, "Communication Quantization for Data-Parallel Training of Deep Neural Networks," 2016 2nd Workshop on Machine Learning in HPC Environments (MLHPC), Salt Lake City, Utah, USA, 2016, pp. 1-8.