loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
13th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP'05)
A Fault Tolerant MPI-IO Implementation using the Expand Parallel File System
Lugano, Switzerland
February 09-February 12
ISBN: 0-7695-2280-7
A. Calder?, Universidad Carlos III de Madrid, Spain
F. Garc?a-Carballeira, Universidad Carlos III de Madrid, Spain
J. Carretero, Universidad Carlos III de Madrid, Spain
J. M. P?rez, Universidad Carlos III de Madrid, Spain
L. M. S?nchez, Universidad Carlos III de Madrid, Spain
Parallelism in file systems is obtained by using several independent server nodes supporting one or more secondary storage devices. This approach increases the performance and scalability of the system, but a fault in one single node can stop the whole system. To avoid this problem, data must be stored using some kind of redundant technique, so any data stored in a faulty element can be recovered. Fault tolerance can be provided in I/O systems using replication or RAID based schemes. However, most of the current systems apply the same technique for all files in the system.
This paper describes the fault tolerance support provided by Expand, a parallel file system based on standard servers. Expand allows to define different fault-tolerant mechanisms at file level. The evaluation compare the performance of Expand with different configurations with PVFS using the FLASH-I/O benchmark.
Index Terms:
Parallel File System, NFS, data declustering, clusters, Fault-Tolerance
Citation:
A. Calder?, F. Garc?a-Carballeira, J. Carretero, J. M. P?rez, L. M. S?nchez, "A Fault Tolerant MPI-IO Implementation using the Expand Parallel File System," pdp, pp.274-281, 13th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP'05), 2005
Usage of this product signifies your acceptance of the Terms of Use.