loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
15th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP'07)
Functional Tests of the RADIC Fault Tolerance Architecture
Naples, Italy
February 07-February 09
ISBN: 0-7695-2784-1
Angelo Duarte, Universidad Autonoma de Barcelona, Spain
Dolores Rexachs, Universidad Autonoma de Barcelona, Spain
Emilio Luque, Universidad Autonoma de Barcelona, Spain
Clusters with thousand of nodes are a reality and the current trend indicates that they are becoming larger. Such large clusters are subject to a relatively high fault frequency so a fault-tolerance scheme is mandatory to assure the correct application completion. Message passing is the programming model often used in large clusters and the current implementations used to achieve fault tolerance in message passing systems do not focus in an architecture that simultaneously attends to scalability, transparency and independence of stable/central elements. The RADIC architecture was proposed and design as a fully distributed structure in order to achieve such requirements. Such architecture defines a fully distributed fault tolerance controller implemented by a set of system processes, which collaborate in order to perform all the basic functions of a fault tolerance protocol. This paper presents the test methodology used to verify the functionality of the RADIC architecture using RADICMPI, a prototype on the MPI semantic.
Citation:
Angelo Duarte, Dolores Rexachs, Emilio Luque, "Functional Tests of the RADIC Fault Tolerance Architecture," pdp, pp.278-287, 15th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP'07), 2007
Usage of this product signifies your acceptance of the Terms of Use.