Guided parallelized stochastic gradient descent for delay compensation

Stochastic gradient descent (SGD) algorithm and its variations have been effectively used to optimize neural network models. However, with the rapid growth of big data and deep learning, SGD is no longer the most suitable choice due to its natural behavior of sequential optimization of the error fun...

Full description

Saved in:

Bibliographic Details
Published in	Applied soft computing Vol. 102; p. 107084
Main Author	Sharma, Anuraganand
Format	Journal Article
Language	English
Published	Elsevier B.V 01.04.2021
Subjects	Asynchronous/synchronous stochastic gradient descent Classification Deep learning Gradient Methods Stochastic gradient descent Deep learning Asynchronous/synchronous stochastic gradient descent Stochastic gradient descent Gradient Methods Classification
Online Access	Get full text
ISSN	1568-4946 1872-9681
DOI	10.1016/j.asoc.2021.107084

Cover

More Information
Summary:	Stochastic gradient descent (SGD) algorithm and its variations have been effectively used to optimize neural network models. However, with the rapid growth of big data and deep learning, SGD is no longer the most suitable choice due to its natural behavior of sequential optimization of the error function. This has led to the development of parallel SGD algorithms, such as asynchronous SGD (ASGD) and synchronous SGD (SSGD) to train deep neural networks. However, it introduces a high variance due to the delay in parameter (weight) update. We address this delay in our proposed algorithm and try to minimize its impact. We employed guided SGD (gSGD) that encourages consistent examples to steer the convergence by compensating the unpredictable deviation caused by the delay. Its convergence rate is also similar to A/SSGD, however, some additional (parallel) processing is required to compensate for the delay. The experimental results demonstrate that our proposed approach has been able to mitigate the impact of delay for the quality of classification accuracy. The guided approach with SSGD clearly outperforms sequential SGD and even achieves an accuracy close to sequential SGD for some benchmark datasets. •Its convergence rate of O1ρT+σ2 shows its applicability for the real-time systems.•The proposed method outperforms synchronous/asynchronous SGD.•The proposed method is compatible with other variations of SGD such as RMSprop.•The delay in parameter updates happens due to several gradients computation in parallel.
ISSN:	1568-4946 1872-9681
DOI:	10.1016/j.asoc.2021.107084