Optimization-Based Block Coordinate Gradient Coding for Mitigating Partial Stragglers in Distributed Learning

Gradient coding schemes effectively mitigate full stragglers in distributed learning by introducing identical redundancy in coded local partial derivatives corresponding to all model parameters. However, they are no longer effective for partial stragglers as they cannot utilize incomplete computatio...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on signal processing Vol. 71; pp. 1023 - 1038
Main Authors	Wang, Qi, Cui, Ying, Li, Chenglin, Zou, Junni, Xiong, Hongkai
Format	Journal Article
Language	English
Published	New York IEEE 2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Big Data Closed form solutions coded computation Coding Complexity Computation Computational modeling Computer networks Distance learning distributed learning Encoding Equivalence Exact solutions gradient coding Iterative algorithms Iterative methods Machine learning Mathematical models Maximization Optimization Parameters Redundancy Run time (computers) Runtime stochastic optimization Stochastic processes
Online Access	Get full text
ISSN	1053-587X 1941-0476
DOI	10.1109/TSP.2023.3244084

Cover

More Information
Summary:	Gradient coding schemes effectively mitigate full stragglers in distributed learning by introducing identical redundancy in coded local partial derivatives corresponding to all model parameters. However, they are no longer effective for partial stragglers as they cannot utilize incomplete computation results from partial stragglers. This paper aims to design a new gradient coding scheme for mitigating partial stragglers in distributed learning. Specifically, we consider a distributed system consisting of one master and <inline-formula><tex-math notation="LaTeX">N</tex-math></inline-formula> workers, characterized by a general partial straggler model and focuses on solving a general large-scale machine learning problem with <inline-formula><tex-math notation="LaTeX">L</tex-math></inline-formula> model parameters using gradient coding. First, we propose a coordinate gradient coding scheme with <inline-formula><tex-math notation="LaTeX">L</tex-math></inline-formula> coding parameters representing <inline-formula><tex-math notation="LaTeX">L</tex-math></inline-formula> possibly different diversities for the <inline-formula><tex-math notation="LaTeX">L</tex-math></inline-formula> coordinates, which generates most gradient coding schemes. Then, we consider the minimization of the expected overall runtime and the maximization of the completion probability with respect to the <inline-formula><tex-math notation="LaTeX">L</tex-math></inline-formula> coding parameters for coordinates, which are challenging discrete optimization problems. To reduce computational complexity, we first transform each to an equivalent but much simpler discrete problem with <inline-formula><tex-math notation="LaTeX">N \ll L</tex-math></inline-formula> variables representing the partition of the <inline-formula><tex-math notation="LaTeX">L</tex-math></inline-formula> coordinates into <inline-formula><tex-math notation="LaTeX">N</tex-math></inline-formula> blocks, each with identical redundancy. This indicates an equivalent but more easily implemented block coordinate gradient coding scheme with <inline-formula><tex-math notation="LaTeX">N</tex-math></inline-formula> coding parameters for blocks. Then, we adopt continuous relaxation to further reduce computational complexity. For the resulting minimization of expected overall runtime, we develop an iterative algorithm of computational complexity <inline-formula><tex-math notation="LaTeX">\mathcal {O}(N^{2})</tex-math></inline-formula> to obtain an optimal solution and derive two closed-form approximate solutions both with computational complexity <inline-formula><tex-math notation="LaTeX">\mathcal {O}(N)</tex-math></inline-formula>. For the resultant maximization of the completion probability, we develop an iterative algorithm of computational complexity <inline-formula><tex-math notation="LaTeX">\mathcal {O}(N^{2})</tex-math></inline-formula> to obtain a stationary point and derive a closed-form approximate solution with computational complexity <inline-formula><tex-math notation="LaTeX">\mathcal {O}(N)</tex-math></inline-formula> at a large threshold. Finally, numerical results show that the proposed solutions significantly outperform existing coded computation schemes and their extensions.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1053-587X 1941-0476
DOI:	10.1109/TSP.2023.3244084