Novel Gradient Sparsification Algorithm via Bayesian Inference

Error accumulation is an essential component of the Top-k sparsification method in distributed gradient descent. It implicitly scales the learning rate and prevents the slow-down of lateral movement, but it can also deteriorate convergence. This paper proposes a novel sparsification algorithm called...

Full description

Saved in:

Bibliographic Details
Published in	2024 IEEE 34th International Workshop on Machine Learning for Signal Processing (MLSP) pp. 1 - 6
Main Authors	Bereyhi, Ali, Liang, Ben, Boudreau, Gary, Afana, Ali
Format	Conference Proceeding
Language	English
Published	IEEE 22.09.2024
Subjects	Bayes methods Bayesian inference communication-efficient distributed learning Computer aided instruction Distance learning distributed stochastic gradient descent Gradient sparsification Inference algorithms Machine learning Machine learning algorithms Signal processing Signal processing algorithms Top-k algorithm Training
Online Access	Get full text
ISSN	2161-0371
DOI	10.1109/MLSP58920.2024.10734719

Cover

More Information
Summary:	Error accumulation is an essential component of the Top-k sparsification method in distributed gradient descent. It implicitly scales the learning rate and prevents the slow-down of lateral movement, but it can also deteriorate convergence. This paper proposes a novel sparsification algorithm called regularized Top-k (REGTop-k) that controls the learning rate scaling of error accumulation. The algorithm is developed by looking at the gradient sparsification as an inference problem and determining a Bayesian optimal sparsification mask via maximum-a-posteriori estimation. It utilizes past aggregated gradients to evaluate posterior statistics, based on which it prioritizes the local gradient entries. Numerical experiments with ResNet-18 on CIFAR-10 show that at 0.1% sparsification, REGTop-k achieves about 8% higher accuracy than standard Top-k.
ISSN:	2161-0371
DOI:	10.1109/MLSP58920.2024.10734719