Optimal Code Regeneration with Background Traffic Awareness in Distributed Storage

In cloud storage systems a certain degree of data redundancy is important for data availability. Timely regeneration of corrupted or lost data shares is desired to meet the MTTR (mean time to recovery) reliability requirements as usually defined in Service Level Agreements (SLA). Current data regene...

Full description

Saved in:

Bibliographic Details
Published in	2018 International Conference on Computing, Networking and Communications (ICNC) pp. 48 - 52
Main Authors	Tao, Yangyang, Yu, Shucheng, Yoshigoe, Kenji, Zhou, Junxiu
Format	Conference Proceeding
Language	English
Published	IEEE 01.03.2018
Subjects	Bandwidth Cloud computing Cloud Storage Data Regeneration Heuristic algorithms Linear Programming Maintenance engineering Markov Chain Markov processes Multi-commodity Flow Optimization Predictive models SRB-X
Online Access	Get full text
DOI	10.1109/ICCNC.2018.8390329

Cover

More Information
Summary:	In cloud storage systems a certain degree of data redundancy is important for data availability. Timely regeneration of corrupted or lost data shares is desired to meet the MTTR (mean time to recovery) reliability requirements as usually defined in Service Level Agreements (SLA). Current data regeneration techniques usually assume uniform and/or unlimited network capacity while ignoring the impacts of background traffics and cloud network architecture in practice. This paper proposes a more realistic regeneration strategy by taking these impacts into consideration. Specifically, our approach first extracts an information flow graph from BCube network architecture based on which the real-time network status is predicted using a Markov Chain model. The optimal code regeneration strategy is then formulated as a linear programming (LP) problem which minimizes the sub-flow rate on bottleneck links subject to the constraint of real-time network dynamics. Finally, a distributed multi-commodity flow dynamic routing (MFDR) approximation algorithm is proposed to solve the code regeneration LP. Simulation results indicate that the proposed distributed algorithm on average saves 16.5% data regeneration time of RCTREE and 45.3% of HDFS.
DOI:	10.1109/ICCNC.2018.8390329