EventGraD: Event-Triggered Communication in Parallel Stochastic Gradient Descent

Communication in parallel systems consumes significant amount of time and energy which often turns out to be a bottleneck in distributed machine learning. In this paper, we present EventGraD - an algorithm with event-triggered communication in parallel stochastic gradient descent. The main idea of t...

Full description

Saved in:

Bibliographic Details
Published in	2020 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC) and Workshop on Artificial Intelligence and Machine Learning for Scientific Applications (AI4S) pp. 1 - 8
Main Authors	Ghosh, Soumyadip, Gupta, Vijay
Format	Conference Proceeding
Language	English
Published	IEEE 01.11.2020
Subjects	Adaptation models Conferences Distributed Machine Learning High performance computing Machine learning algorithms Neural networks Stochastic processes Training
Online Access	Get full text
DOI	10.1109/MLHPCAI4S51975.2020.00008

Cover

More Information
Summary:	Communication in parallel systems consumes significant amount of time and energy which often turns out to be a bottleneck in distributed machine learning. In this paper, we present EventGraD - an algorithm with event-triggered communication in parallel stochastic gradient descent. The main idea of this algorithm is to modify the requirement of communication at every epoch to communicating only in certain epochs when necessary. In particular, the parameters are communicated only in the event when the change in their values exceed a threshold. The threshold for a parameter is chosen adaptively based on the rate of change of the parameter. The adaptive threshold ensures that the algorithm can be applied to different models on different datasets without any change. We focus on data-parallel training of a popular convolutional neural network used for training the MNIST dataset and show that EventGraD can reduce the communication load by up to 70% while retaining the same level of accuracy.
DOI:	10.1109/MLHPCAI4S51975.2020.00008