Cluster Computing and the Grid, IEEE International Symposium on (2008)
May 19, 2008 to May 22, 2008
As parallel file systems span larger and larger numbers of nodes in order to provide the performance and scalability necessary for modern cluster applications, the need for fault-tolerance and high data availability file systems has arisen. Modern parallel file systems spanning tens, hundreds, or even thousands of servers will require fault tolerance to avoid job failure and catastrophic data loss due to a single disk failure or server loss. Effective fault tolerance in parallel file systems must provide a high degree of data resiliency, consistency, and scalable performance. In this paper, we describe a data replication technique that meets the resiliency and consistency requirements of parallel file systems and provides scalable performance. We measure the performance of our proposed mechanism by implementing it in a popular parallel file system, PVFS.
Parallel File Systems, Replication, Mirroring, Fault Tolerance
B. W. Settlemyer and W. B. Ligon III, "A Technique for Lock-Less Mirroring in Parallel File Systems," 2008 8th International Symposium on Cluster Computing and the Grid (CCGRID '08)(CCGRID), Lyon, 2008, pp. 801-806.