Cooperative Diagnosis and Routing in Fault-Tolerant Multiprocessor Systems

In this note, we consider the problem of fault-tolerant routing in multiprocessor systems when incomplete, or partial, diagnostic information is available. We first define a new type of partial diagnosis, known as k-reachability diagnosis. The overhead for k-reachability diagnosis increases with k,...

Full description

Saved in:
Bibliographic Details
Published inJournal of parallel and distributed computing Vol. 27; no. 2; pp. 205 - 211
Main Authors Blough, D.M., Wang, H.Y.
Format Journal Article
LanguageEnglish
Published San Diego, CA Elsevier Inc 01.06.1995
Elsevier
Subjects
Online AccessGet full text
ISSN0743-7315
1096-0848
DOI10.1006/jpdc.1995.1083

Cover

More Information
Summary:In this note, we consider the problem of fault-tolerant routing in multiprocessor systems when incomplete, or partial, diagnostic information is available. We first define a new type of partial diagnosis, known as k-reachability diagnosis. The overhead for k-reachability diagnosis increases with k, which specifies the radius of diagnostic information maintained by each node. We then present a routing algorithm, known as Algorithm Partial Route, that makes use of k-reachability diagnostic information and allows a trade-off between the amount of diagnostic information and the quality of routing. Partial Route is the first algorithm capable of handling systems of arbitrary topology containing an arbitrary number of faults. The worst-case performance of the algorithm in an n-node system, is shown to be optimal when k = n − 1 and within a factor of 2 of optimal when k = 1. Simulation results on meshes and hypercubes are also presented that show, in the average case, Algorithm Partial Route is nearly optimal for relatively small values of k.
ISSN:0743-7315
1096-0848
DOI:10.1006/jpdc.1995.1083