2012 ACM/IEEE/SCS 26th Workshop on Principles of Advanced and Distributed Simulation (2003)
San Diego, California
June 10, 2003 to June 13, 2003
Francesco Quaglia , Universita di Roma "La Sapienza"
Andrea Santoro , Universita di Roma "La Sapienza"
CCL (Checkpointing and Communication Library) is a recently developed software in support of optimistic parallel simulation on myrinet based clusters. Beyond classical low latency message delivery functionalities, this library implements CPU offloaded, semi-asynchronous checkpointing functionalities based on data transfer capabilities provided by a programmable DMA engine on board of myrinet network cards. The latest version of CCL (v2.4), designed for M2M-PCI32C myrinet cards, only supports monoprogrammed semi-asynchronous checkpoints. This forces resynchronization between CPU and DMA activities each time a new checkpoint request must be issued at the simulation application level while the last issued one is still being carried out by the DMA engine. In this paper we present CCL v3.0 that, exploiting hardware features of more advanced M3M-PCI64C myrinet cards, supports multiprogrammed semi-asynchronous checkpoints. The multiprogrammed approach allows higher degree of concurrency between checkpointing and other simulation specific operations carried out by the CPU, with obvious benefits on performance. We also report the results of the evaluation of those benefits for the case of a personal communication system simulation application.
Francesco Quaglia, Andrea Santoro, "CCL v3.0: Multiprogrammed Semi-Asynchronous Checkpoints", 2012 ACM/IEEE/SCS 26th Workshop on Principles of Advanced and Distributed Simulation, vol. 00, no. , pp. 21, 2003, doi:10.1109/PADS.2003.1207417