Algorithm-Based Fault Tolerance Applied to P2P Computing Networks

P2P computing platforms are subject to a wide range of attacks. In this paper, we propose a generalisation of the previous disk-less checkpointing approach for fault-tolerance in high performance computing systems. Our contribution is in two directions: first, instead of restricting to 2D checksums...

Full description

Saved in:

Bibliographic Details
Published in	2009 First International Conference on Advances in P2P Systems pp. 144 - 149
Main Authors	Roche, T., Cunche, M., Roch, J.-L.
Format	Conference Proceeding
Language	English
Published	IEEE 01.10.2009
Subjects	ABFT Checkpointing Computer networks distributed computing Fault tolerance Fault tolerant systems Galois fields High performance computing Linear code linear coding P2P Parity check codes Peer to peer computing Reed-Solomon codes SUMMA
Online Access	Get full text
ISBN	1424450845 9781424450848
DOI	10.1109/AP2PS.2009.30

Cover

More Information
Summary:	P2P computing platforms are subject to a wide range of attacks. In this paper, we propose a generalisation of the previous disk-less checkpointing approach for fault-tolerance in high performance computing systems. Our contribution is in two directions: first, instead of restricting to 2D checksums that tolerate only a small number of node failures, we propose to base disk-less checkpointing on linear codes to tolerate potentially a large number of faults. Then, we compare and analyse the use of low density parity check (LDPC) to classical Reed-Solomon (RS) codes with respect to different fault models to fit P2P systems. Our LDPC disk-less checkpointing method is well suited when only node disconnections are considered, but cannot deal with byzantine peers. Our RS disk-less checkpointing method tolerates such byzantine errors, but is restricted to exact finite field computations.
ISBN:	1424450845 9781424450848
DOI:	10.1109/AP2PS.2009.30