The Community for Technology Leaders
2011 International Conference on Parallel Processing (2011)
Taipei City, Taiwan
Sept. 13, 2011 to Sept. 16, 2011
ISSN: 0190-3918
ISBN: 978-0-7695-4510-3
pp: 375-384
Checkpoint/Restart (C/R) mechanisms have been widely adopted by many MPI libraries [1 -- 3] to achieve fault-tolerance. However, a major limitation of such mechanisms is the intensive IO bottleneck caused by the need to dump the snapshots of all processes into persistent storage. Several studies have been conducted to minimize this overhead [4, 5], but most of these proposed optimizations are performed inside specific MPI stack or check pointing library or applications, hence they are not portable enough to be applied to other MPI stacks and applications. In this paper, we propose a filesystem based approach to alleviate this checkpoint IO bottleneck. We propose a new filesystem, named Checkpoint-Restart File system (CRFS), which is a lightweight user-level filesystem based on FUSE (File system in User space). CRFS is designed with Checkpoint/Restart I/O traffic in mind to efficiently handle the concurrent write requests. Any software component using standard filesystem interfaces can transparently benefit from CRFS's capabilities. CRFS intercepts the checkpoint file write system calls and aggregates them into fewer bigger chunks which are asynchronously written to the underlying filesystem for more efficient IO. CRFS manages a ?exible internal IO thread pool to throttle concurrent IO to alleviate IO contention for better IO performance. CRFS can be mounted over any standard filesystem like ext3, NFS and Lustre. We have implemented CRFS and evaluated its performance using three popular C/R capable MPI stacks: MVAPICH2, MPICH2 and OpenMPI. Experimental results show significant performance gains for all three MPI stacks. CRFS achieves up to 5.5X speedup in checkpoint writing performance to Lustre filesystem. Similar level of improvements are also obtained with ext3 and NFS filesystems. To the best of our knowledge, this is the first such portable and light-weight filesystem designed for generic Checkpoint/Restart data.
checkpoint-restart, userspace filesystem, write aggregation

R. Rajachandrasekar, J. Huang, H. Wang, X. Besseron, X. Ouyang and D. K. Panda, "CRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart," 2011 International Conference on Parallel Processing(ICPP), Taipei City, Taiwan, 2011, pp. 375-384.
222 ms
(Ver 3.1 (10032016))