Issue No.06 - June (2003 vol.14)
<p><b>Abstract</b>—This paper describes a <it>nonblocking checkpointing</it> mode in support of optimistic parallel discrete event simulation. This mode allows real concurrency in the execution of state saving and other simulation specific operations (e.g., event list update, event execution) with the aim of removing the cost of recording state information from the completion time of the parallel simulation application. We present an implementation of a C library supporting nonblocking checkpointing on a myrinet based cluster, which demonstrates the practical viability of this checkpointing mode on standard off-the-shelf hardware. By the results of an empirical study on classical parameterized synthetic benchmarks, we show that, except for the case of minimal state granularity applications, nonblocking checkpointing allows improvement of the speed of the parallel execution, as compared to commonly adopted, optimized checkpointing methods based on the classical blocking mode. A performance study for the case of a Personal Communication System (PCS) simulation is additionally reported to point out the benefits from nonblocking checkpointing for a real world application.</p>
Parallel discrete-event simulation, optimistic synchronization, checkpointing, myrinet, DMA, performance optimization.
Francesco Quaglia, "Nonblocking Checkpointing for Optimistic Parallel Simulation: Description and an Implementation", IEEE Transactions on Parallel & Distributed Systems, vol.14, no. 6, pp. 593-610, June 2003, doi:10.1109/TPDS.2003.1206506