Seventeenth Workshop on Parallel and Distributed Simulation, 2003. (PADS 2003). Proceedings. (2003)
San Diego, California
June 10, 2003 to June 13, 2003
Francesco Quaglia , Universita di Roma "La Sapienza"
Andrea Santoro , Universita di Roma "La Sapienza"
CCL (Checkpointing and Communication Library) is a recently developed software in support of optimistic parallel simulation on myrinet based clusters. Beyond classical low latency message delivery functionalities, this library implements CPU offloaded, semi-asynchronous checkpointing functionalities based on data transfer capabilities provided by a programmable DMA engine on board of myrinet network cards. The latest version of CCL (v2.4), designed for M2M-PCI32C myrinet cards, only supports monoprogrammed semi-asynchronous checkpoints. This forces resynchronization between CPU and DMA activities each time a new checkpoint request must be issued at the simulation application level while the last issued one is still being carried out by the DMA engine. In this paper we present CCL v3.0 that, exploiting hardware features of more advanced M3M-PCI64C myrinet cards, supports multiprogrammed semi-asynchronous checkpoints. The multiprogrammed approach allows higher degree of concurrency between checkpointing and other simulation specific operations carried out by the CPU, with obvious benefits on performance. We also report the results of the evaluation of those benefits for the case of a personal communication system simulation application.
F. Quaglia and A. Santoro, "CCL v3.0: Multiprogrammed Semi-Asynchronous Checkpoints," Seventeenth Workshop on Parallel and Distributed Simulation, 2003. (PADS 2003). Proceedings.(PADS), San Diego, California, 2003, pp. 21.