This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
3rd Euromicro Workshop on Parallel and Distributed Processing
Fault-tolerance on regular decomposition grid applications
San Remo, Italy
January 25-January 27
ISBN: 0-8186-7031-2
L.M. Silva, Dept. de Engenharia Informatica, Coimbra Univ., Portugal
J.G. Silva, Dept. de Engenharia Informatica, Coimbra Univ., Portugal
S. Chapple, Dept. de Engenharia Informatica, Coimbra Univ., Portugal
L. Clarke, Dept. de Engenharia Informatica, Coimbra Univ., Portugal
Writing parallel applications is considerably more complex due to additional problems not found in the sequential environment. The main problems include communication, synchronization data partitioning and distribution, mapping of processes, heterogeneity and fault tolerance. Fault tolerance is a very important feature in parallel/distributed systems since the mean time between failures of the system decreases with the number of processors, and the failure of just one process(or) can lead to the crash of the entire application. This paper presents an example of a parallel library (PUL-RD) that solves most of the problems pointed out before and provides support for fault tolerance. The original version of the library offers high-level support for parallelism in a portable way and can be used to write grid-based parallel applications which have a regular decomposition. In this paper, we will describe the fault-tolerance issues that were incorporated into the PUL-RD, giving special attention to the functionality of the checkpointing scheme.
Index Terms:
synchronisation; fault tolerant computing; software fault tolerance; fault-tolerance; regular decomposition grid applications; communication; synchronization data partitioning; distribution; heterogeneity; high-level support; PUL-RD; checkpointing
Citation:
L.M. Silva, J.G. Silva, S. Chapple, L. Clarke, "Fault-tolerance on regular decomposition grid applications," pdp, pp.358, 3rd Euromicro Workshop on Parallel and Distributed Processing, 1995
Usage of this product signifies your acceptance of the Terms of Use.