On-Line Fault Monitoring

Sequoia's fault-tolerant computers were designed subject to some rather rigid constraints: No single hardware malfunction can generate an undetected error; an integrated circuit is a "black box" that can fail in arbitrary ways, affecting an arbitrary subset of input and output signals...

Full description

Saved in:
Bibliographic Details
Published inJournal of electronic testing Vol. 12; no. 1-2; pp. 21 - 27
Main Author Stiffler, J.J.
Format Journal Article
LanguageEnglish
Published Boston Springer Nature B.V 01.02.1998
Subjects
Online AccessGet full text
ISSN0923-8174
1573-0727
DOI10.1023/A:1008201032535

Cover

More Information
Summary:Sequoia's fault-tolerant computers were designed subject to some rather rigid constraints: No single hardware malfunction can generate an undetected error; an integrated circuit is a "black box" that can fail in arbitrary ways, affecting an arbitrary subset of input and output signals; faults can be transient or intermittent with arbitrary durations and repetition intervals. Moreover, the incremental hardware to be used to achieve these goals was to be kept to a minimum. The resulting computers do, to a very large extent, satisfy these constraints. To achieve this, a combination of fault-monitoring techniques was used, including: Bit and nibble error-correcting and error-detecting codes; byte parity codes with orthogonal partitioning; cyclic-residue codes on I/O data transfers; codes designed to protect against address counter overruns on I/O transfers; lossless control-signal compactors. The nature and rationale for these various fault monitors is described as well as the analytical and testing techniques used to estimate the resulting coverage.[PUBLICATION ABSTRACT]
Bibliography:SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
ObjectType-Article-2
content type line 23
ISSN:0923-8174
1573-0727
DOI:10.1023/A:1008201032535