Coded Elastic Computing on Machines With Heterogeneous Storage and Computation Speed

We study the optimal design of heterogeneous Coded Elastic Computing (CEC) where machines have varying computation speeds and storage. CEC introduced by Yang et al. in 2018 is a framework that mitigates the impact of elastic events, where machines can join and leave at arbitrary times. In CEC, data...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on communications Vol. 69; no. 5; pp. 2894 - 2908
Main Authors Woolsey, Nicholas, Chen, Rong-Rong, Ji, Mingyue
Format Journal Article
LanguageEnglish
Published New York IEEE 01.05.2021
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text
ISSN0090-6778
1558-0857
DOI10.1109/TCOMM.2021.3056089

Cover

More Information
Summary:We study the optimal design of heterogeneous Coded Elastic Computing (CEC) where machines have varying computation speeds and storage. CEC introduced by Yang et al. in 2018 is a framework that mitigates the impact of elastic events, where machines can join and leave at arbitrary times. In CEC, data is distributed among machines using a Maximum Distance Separable (MDS) code such that subsets of machines can perform the desired computations. However, state-of-the-art CEC designs only operate on homogeneous networks where machines have the same speeds and storage. This may not be practical. In this work, based on an MDS storage assignment, we develop a novel computation assignment approach for heterogeneous CEC networks to minimize the overall computation time. We first consider the scenario where machines have heterogeneous computing speeds but same storage and then the scenario where both heterogeneities are present. We propose a novel combinatorial optimization formulation and solve it exactly by decomposing it into a convex optimization problem to find the optimal computation load and a filling problem to find the exact computation assignment. A low-complexity filling algorithm is adapted and can be completed within a number of iterations equal to at most the number of available machines.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0090-6778
1558-0857
DOI:10.1109/TCOMM.2021.3056089