On-Line Fault Monitoring

Sequoia's fault-tolerant computers were designed subject to some rather rigid constraints: No single hardware malfunction can generate an undetected error; an integrated circuit is a "black box" that can fail in arbitrary ways, affecting an arbitrary subset of input and output signals...

Full description

Saved in:

Bibliographic Details
Published in	Journal of electronic testing Vol. 12; no. 1-2; pp. 21 - 27
Main Author	Stiffler, J.J.
Format	Journal Article
Language	English
Published	Boston Springer Nature B.V 01.02.1998
Subjects	Product introduction
Online Access	Get full text
ISSN	0923-8174 1573-0727
DOI	10.1023/A:1008201032535

Cover

More Information
Summary:	Sequoia's fault-tolerant computers were designed subject to some rather rigid constraints: No single hardware malfunction can generate an undetected error; an integrated circuit is a "black box" that can fail in arbitrary ways, affecting an arbitrary subset of input and output signals; faults can be transient or intermittent with arbitrary durations and repetition intervals. Moreover, the incremental hardware to be used to achieve these goals was to be kept to a minimum. The resulting computers do, to a very large extent, satisfy these constraints. To achieve this, a combination of fault-monitoring techniques was used, including: Bit and nibble error-correcting and error-detecting codes; byte parity codes with orthogonal partitioning; cyclic-residue codes on I/O data transfers; codes designed to protect against address counter overruns on I/O transfers; lossless control-signal compactors. The nature and rationale for these various fault monitors is described as well as the analytical and testing techniques used to estimate the resulting coverage.[PUBLICATION ABSTRACT]
Bibliography:	SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-2 content type line 23
ISSN:	0923-8174 1573-0727
DOI:	10.1023/A:1008201032535