This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Ninth IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems (MASCOTS'01)
Tuning of the Checkpointing and Communication Library for Optimistic Simulation on Myrinet Based NOWs
Cincinnati, Ohio
August 15-August 18
ISBN: 0-7695-1315-8
Francesco Quaglia, Universit? di Roma "La Sapienza"
Andrea Santoro, Universit? di Roma "La Sapienza"
Bruno Ciciani, Universit? di Roma "La Sapienza"
Abstract: Recently a Checkpointing and Communication Library (CCL) for optimistic simulation on Myrinet based Network of Workstations (NOWs) has been presented. CCL offloads checkpoint operations from the CPU by charging them to a programmable DMA engine on the Myrinet network card. CCL includes also functionalities for freezing the simulation application on demand, which can be used for data consistency maintenance (for example when a state buffer needs to be accessed for further modifications while a DMA based checkpoint operation involving it is still in progress). Programming the DMA to perform a checkpoint operation by transferring large data blocks in a single burst allows the latency of any checkpoint operation to be kept low. This reduces the probability for application freezing to really occur. On the other hand, transferring large data blocks in a single burst might cause negative interference on communication since that DMA (and other circuitry) cannot be used for communication functionalities until the currently executed data transfer is not yet completed. In this paper we present a detailed identification of the effects of the burst length, from which we outline a set of relevant phenomena to take into account in order to determine a compile time suited value for the burst length itself. We also report measures quantifying these phenomena for the case of a PC cluster. Actually, the data indicate that communication functionalities do not suffer from the use of non-minimal burst lengths for checkpoint operations, thus pointing out how, if well tuned, CCL provides highly effective, CPU offloaded, checkpointing functionalities.
Citation:
Francesco Quaglia, Andrea Santoro, Bruno Ciciani, "Tuning of the Checkpointing and Communication Library for Optimistic Simulation on Myrinet Based NOWs," mascots, pp.0241, Ninth IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems (MASCOTS'01), 2001
Usage of this product signifies your acceptance of the Terms of Use.