1999 International Symposium on Parallel Architectures, Algorithms and Networks (ISPAN '99)
Performance of Fault Tolerant Networks of Workstations
Fremantle, Australia
June 23-June 25
ISBN: 0-7695-0231-8
Functional or dataflow models of computation enable a program's run-time system to determine which portions of a computation must be repeated when faults occur. Straight-forward modifications to the run-time system of Cilk 2.0 - a threaded extension of C - enable a Network of Workstations parallel processing system to tolerate fail-stop faults of the individual processors or the network.It is shown in this work that the overheads needed to provide this fault tolerance are mainly CPU cycles and memory, with very little additional network load being generated in the absence of faults.This makes it feasible to run long computations successfully on the typical networks of workstations found in large organisations where ownership, control and distribution of the individual processors may be widely distributed.
Index Terms:
Fault Tolerance; Networks of Workstations; Dataflow
Citation:
John Morris, "Performance of Fault Tolerant Networks of Workstations," ispan, pp.125, 1999 International Symposium on Parallel Architectures, Algorithms and Networks (ISPAN '99), 1999