EventGraD: Event-Triggered Communication in Parallel Stochastic Gradient Descent

Communication in parallel systems consumes significant amount of time and energy which often turns out to be a bottleneck in distributed machine learning. In this paper, we present EventGraD - an algorithm with event-triggered communication in parallel stochastic gradient descent. The main idea of t...

Full description

Saved in:
Bibliographic Details
Published in2020 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC) and Workshop on Artificial Intelligence and Machine Learning for Scientific Applications (AI4S) pp. 1 - 8
Main Authors Ghosh, Soumyadip, Gupta, Vijay
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.11.2020
Subjects
Online AccessGet full text
DOI10.1109/MLHPCAI4S51975.2020.00008

Cover

More Information
Summary:Communication in parallel systems consumes significant amount of time and energy which often turns out to be a bottleneck in distributed machine learning. In this paper, we present EventGraD - an algorithm with event-triggered communication in parallel stochastic gradient descent. The main idea of this algorithm is to modify the requirement of communication at every epoch to communicating only in certain epochs when necessary. In particular, the parameters are communicated only in the event when the change in their values exceed a threshold. The threshold for a parameter is chosen adaptively based on the rate of change of the parameter. The adaptive threshold ensures that the algorithm can be applied to different models on different datasets without any change. We focus on data-parallel training of a popular convolutional neural network used for training the MNIST dataset and show that EventGraD can reduce the communication load by up to 70% while retaining the same level of accuracy.
DOI:10.1109/MLHPCAI4S51975.2020.00008