A Review of Data Placement and Replication Strategies Based on Machine Learning

The global increase in data volumes has brought forth the need for scalable distributed systems that can provide satisfactory quality of service. Data placement and replication are well known techniques that provide increased performance, improved fault tolerance and higher availability. These techn...

Full description

Saved in:
Bibliographic Details
Published inProceedings - International Conference on Parallel and Distributed Systems pp. 278 - 285
Main Authors Najjar, Amir, Mokadem, Riad, Pierson, Jean-Marc
Format Conference Proceeding
LanguageEnglish
Published IEEE 10.10.2024
Subjects
Online AccessGet full text
ISSN2690-5965
DOI10.1109/ICPADS63350.2024.00044

Cover

More Information
Summary:The global increase in data volumes has brought forth the need for scalable distributed systems that can provide satisfactory quality of service. Data placement and replication are well known techniques that provide increased performance, improved fault tolerance and higher availability. These techniques often require threshold-based activation mechanisms that can vary due to the nature of the workload and the underlying system architecture. Hence, setting and adjusting those thresholds usually require human intervention. In this context, machine learning presents a promising facet to automatically define such thresholds to adapt to different workloads and architectures. In this paper, we study the data placement and replication strategies proposed in the literature that employ machine learning. We classify such strategies based on the machine learning method, the platform on which they are deployed, the dynamicity and the achieved objectives. We describe the approach applied by each strategy as well as possible limitations. In addition, we provide insights into metrics used to evaluate the strategies. We highlight the need to design data placement and replication strategies that respond better to modern needs for distributed systems. We also motivate the use of machine learning to achieve autonomy in distributed systems.
ISSN:2690-5965
DOI:10.1109/ICPADS63350.2024.00044