3rd Euromicro Workshop on Parallel and Distributed Processing Algorithm-based fault-tolerant programming in scientific computation on multiprocessors San Remo, Italy January 25-January 27 ISBN: 0-8186-7031-2
Efficient parallel algorithms proposed to solve many fundamental problems in scientific computation are sensitive to processor failures. Because of its low costs, algorithm-based fault tolerance is an interesting concept for introducing fault tolerance into existing multiprocessors. To facilitate fault-tolerant programming in scientific computation, we have modified and developed further an existing parallel run-time environment. In this paper the aspect of tuning known error processing techniques to the algorithm-based approach is primarily examined. Design issues for implementation and execution time overhead of a fault-tolerant application in our run-time environment are studied. In contrast to many other environments for parallel fault-tolerant programming, which use the master/slave programming model, our environment enables one to add fault tolerance to existing parallel applications in scientific computation.
Index Terms:
parallel algorithms; multiprocessing systems; software fault tolerance; parallel programming; programming environments; algorithm-based fault-tolerant programming; scientific computation; multiprocessors; parallel algorithms; parallel run-time environment; error processing techniques; execution time overhead; master/slave programming model
Citation:
J. Altmann, A. Bohm, "Algorithm-based fault-tolerant programming in scientific computation on multiprocessors," pdp, pp.374, 3rd Euromicro Workshop on Parallel and Distributed Processing, 1995 Usage of this product signifies your acceptance of the Terms of Use. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||