The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.06 - June (1994 vol.5)
pp: 649-653
ABSTRACT
<p>Considers the applicability of algorithm based fault tolerance (ABET) to massively parallel scientific computation. Existing ABET schemes can provide effective fault tolerance at a low cost For computation on matrices of moderate size; however, the methods do not scale well to floating-point operations on large systems. This short note proposes the use of a partitioned linear encoding scheme to provide scalability. Matrix algorithms employing this scheme are presented and compared to current ABET schemes. It is shown that the partitioned scheme provides scalable linear codes with improved numerical properties with only a small increase in hardware and time overhead.</p>
INDEX TERMS
Index Termsfault tolerant computing; software reliability; error correction codes; error detectioncodes; parallel architectures; matrix algebra; algorithm based fault tolerance; massivelyparallel systems; partitioned encoding; ABET; scalability; matrix algorithms; partitionedscheme; checksum code; error detection; error correction; transient errors
CITATION
J. Rexford, N.K. Jha, "Partitioned Encoding Schemes for Algorithm-Based Fault Tolerance in Massively Parallel Systems", IEEE Transactions on Parallel & Distributed Systems, vol.5, no. 6, pp. 649-653, June 1994, doi:10.1109/71.285610
24 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool