MSF-Model: Queuing-Based Analysis and Prediction of Metastable Failures in Replicated Storage Systems

Metastable failure is a recent abstraction of a pattern of failures that occurs frequently in real-world distributed storage systems. In this paper, we propose a formal analysis and modeling of metastable failures in replicated storage systems. We focus on a foundational problem in distributed syste...

Full description

Saved in:
Bibliographic Details
Published inProceedings - Symposium on Reliable Distributed Systems pp. 12 - 22
Main Authors Habibi, Farzad, Lorido-Botran, Tania, Showail, Ahmad, Sturman, Daniel C., Nawab, Faisal
Format Conference Proceeding
LanguageEnglish
Published IEEE 30.09.2024
Online AccessGet full text
ISSN2575-8462
DOI10.1109/SRDS64841.2024.00013

Cover

More Information
Summary:Metastable failure is a recent abstraction of a pattern of failures that occurs frequently in real-world distributed storage systems. In this paper, we propose a formal analysis and modeling of metastable failures in replicated storage systems. We focus on a foundational problem in distributed systems-the problem of consensus-to have an impact on a large class of systems. Our main contribution is the development of a queuing-based analytical model, MSF-Model, that can be used to characterize and predict metastable failures. MSF-Model integrates novel modeling concepts that allow modeling metastable failures, which was intractable to model prior to our work. We also perform real experiments to reproduce and validate our model. Our real experiments show that MSF-Model predicts metastable failures with high accuracy by comparing the real experiment with the predictions from the queuing-based model.
ISSN:2575-8462
DOI:10.1109/SRDS64841.2024.00013