2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) (2018)
Washington, DC, USA
May 1, 2018 to May 4, 2018
Deep learning is now the most promising approach to develop human-intelligent computer systems. To speedup the development of neural networks, researchers have designed many distributed learning algorithms to facilitate the training process. In these algorithms, people use a constant to indicate the communication period for model/gradient exchange. We find that this type of communication pattern could incur unnecessary and inefficient data transmission for some training methods e.g., elastic SGD and gossiping SGD. In this paper, we propose an adaptive communication method to improve the performance of gossiping SGD. Instead of using a fixed period for model exchange, we exchange the models with other machines according to the change of the local model. This makes the communication more efficient and thus improves the performance. The experiment results show that our method reduces the communication traffic by 92%, which results in 52% reduction in training time while preserving the prediction accuracy compared with gossiping SGD.
data communication, gradient methods, graphics processing units, learning (artificial intelligence), neural nets, pattern clustering, stochastic processes, telecommunication traffic
L. Ho, J. Wu and P. Liu, "Adaptive Communication for Distributed Deep Learning on Commodity GPU Cluster," 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), Washington, DC, USA, 2018, pp. 283-290.