This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Nonblocking Checkpointing for Optimistic Parallel Simulation: Description and an Implementation
June 2003 (vol. 14 no. 6)
pp. 593-610

Abstract—This paper describes a nonblocking checkpointing mode in support of optimistic parallel discrete event simulation. This mode allows real concurrency in the execution of state saving and other simulation specific operations (e.g., event list update, event execution) with the aim of removing the cost of recording state information from the completion time of the parallel simulation application. We present an implementation of a C library supporting nonblocking checkpointing on a myrinet based cluster, which demonstrates the practical viability of this checkpointing mode on standard off-the-shelf hardware. By the results of an empirical study on classical parameterized synthetic benchmarks, we show that, except for the case of minimal state granularity applications, nonblocking checkpointing allows improvement of the speed of the parallel execution, as compared to commonly adopted, optimized checkpointing methods based on the classical blocking mode. A performance study for the case of a Personal Communication System (PCS) simulation is additionally reported to point out the benefits from nonblocking checkpointing for a real world application.

[1] S.R. Das and R.M. Fujimoto, "An Adaptive Memory Management Protocol for Time Warp Parallel Simulation," Proc. ACM Sigmetrics Conf. Measurement and Modeling of Computer Systems, pp. 201-210,Nashville, ACM, 1994.
[2] S.R. Das and R.M. Fujimoto, An Empirical Evaluation of Performance-Memory Trade-Offs in Time Warp IEEE Trans. Parallel and Distributed Systems, vol. 8, no. 2, pp. 210-224, Feb. 1997.
[3] H. Bauer and C. Sporrer, “Reducing Rollback Overhead in Time Warp Based Distributed Simulation with Optimized Incremental State Saving,” Proc. 26th Ann. Simulation Symp., pp. 12-20, 1993.
[4] S. Bellenot, State Skipping Performance with the Time Warp Operating System Proc. Sixth Workshop Parallel and Distributed Simulation (PADS '92), pp. 33-42, 1992.
[5] A. Boukerche, S.K. Das, A. Fabbri, and O. Yildz, Exploiting Model Independence for Parallel PCS Network Simulation Proc. 13th Workshop Parallel and Distributed Simulation (PADS '99), pp. 166-173, 1999.
[6] J. Briner, Fast Parallel Simulation of Digital Systems Proc. Multiconf. Advances in Parallel and Distributed Simulation, vol. 23, no. 1, pp. 71-77, 1991.
[7] D. Bruce, "The Treatment of State in Optimistic Systems," Proc. Ninth Workshop Parallel and Distributed Simulation, pp. 40-48, 1995.
[8] C.D. Carothers, D. Bauer, and S. Pearce, ROSS: a High Performance Modular Time Warp System Proc. 14th Workshop on Parallel and Distributed Simulation (PADS '00), pp. 53-60, 2000.
[9] C.D. Carothers et al., "Distributed Simulation of Large-Scale PCS Networks," IEEE Second Int'l Symp.Modeling, Analysis and Simulation of Computer and Telecommunication Systems (Mascots'94), IEEE Computer Soc. Press, Los Alamitos, Calif., 1994.
[10] C.D. Carothers, R.M. Fujimoto, and Y.-B. Lin, "A Case Study in Simulating PCS Networks Using Time Warp," Proc. Ninth Workshop on Parallel and Distributed Simulation, Simulation Councils, San Diego, Calif., 1995, pp. 87-94.
[11] K.M. Chandy and J. Misra, Distributed Simulation: A Case Study in the Design and Verification of Distributed Programs IEEE Trans. Software Eng., vol. 5, no. 5, pp. 440-452, 1979.
[12] E.N. Elnozahy, D.B. Johnson, and W. Zwaenepoel, "The Performance of Consistent Checkpointing," Proc. 11th Symp. Reliable Distributed Systems, pp. 86-95, Oct. 1992.
[13] J. Fleischmann and P.A. Wilsey, “Comparative Analysis of Periodic State Saving Techniques in TimeWarp Simulators,” Proc. Ninth Workshop Parallel and Distributed Simulation (PADS '95), pp. 50-58, June 1995.
[14] R.M. Fujimoto, ”Time Warp on a Shared Memory Multiprocessor,” Trans. Soc. Computer Simulation, vol. 6, no. 3, pp. 211-239, July 1989.
[15] R. Fujimoto, “Parallel Discrete Event Simulation,” Comm. ACM, vol. 33, no. 10, pp. 30-53, Oct. 1990.
[16] R.M. Fujimoto, Performance of Time Warp under Synthetic Workloads Proc. SCS Multiconf. Distributed Simulation, vol. 22, no. 1, pp. 23-28, 1990.
[17] R.M. Fujimoto, J. Tsai, and G.C. Gopalakrishnan, “Design and Evaluation of the Rollback Chip: Special Purpose Hardware for Time Warp,” IEEE Trans. Computers, vol. 41, no. 1, pp. 68-82, Jan. 1992.
[18] A. Gafni, Space Management and Cancellation Mechanisms for Time Warp Technical Report TR-85-341, Univ. of Southern California, Los Angeles, 1985.
[19] D.R. Jefferson, "Virtual Time," ACM Trans. Programming Languages and Systems, vol. 7, no. 3, pp. 404-425, July 1985.
[20] INTEL, IA-32 Intel Architecture Software Developer's Manual, vol. 1: Basic Architecture, 2001.
[21] INTEL, IA-32 Intel Architecture Software Developer's Manual, vol. 3: System Programming Guide, 2001.
[22] INTEL, Intel Chipsets available athttp://developer. intel. com/designchipsets /, 2002.
[23] K. Li, J.F. Naughton, and J.S. Plank, "Low-Latency, Concurrent Checkpointing for Parallel Programs," IEEE Trans. Parallel and Distributed Systems, vol. 5, pp. 874-879, Aug. 1994.
[24] Y.B. Lin and E.D. Lazowska, Processor Scheduling for Time Warp Parallel Simulation Advances in Parallel and Distributed Simulation, pp. 11-14, 1991.
[25] Y.B. Lin, B.R. Preiss, W.M. Loucks, and E.D. Lazowska, “Selecting the Checkpoint Interval in Time Warp Simulation,” Proc. Seventh Workshop Parallel and Distributed Simulation, pp. 3-10, 1993.
[26] Myricom,http:/www.myri.com, 2002.
[27] Myricom, LANai 4 draft, 1999.
[28] S. Pakin, M. Lauria, and A. Chien, "High Performance Messaging on Workstations: Illinois Fast Messages (FM) for Myrinet," Proc. Supercomputing 95, IEEE Computer Society, Los Alamitos, Calif., Dec. 1995.
[29] J.S. Plank, M. Beck, and G. Kingsley, Libckpt: Transparent Checkpointing under UNIX Proc. USENIX Winter 1995 Technical Conf., pp. 213-223, 1995.
[30] B.R. Preiss, W. Loucks, and I. MacIntyre, “Effects of the Checkpoint Interval on Time and Space in Time Warp,” ACM Trans. Modeling and Computer Simulation, July 1994.
[31] F. Quaglia, Event History Based Sparse State Saving in Time Warp Proc. 12th Workshop Parallel and Distributed Simulation (PADS '98), pp. 72-79, 1998.
[32] F. Quaglia, Fast-Software-Checkpointing in Optimistic Simulation: Embedding State Saving into the Event Routine Instructions Proc. 13th Workshop Parallel and Distributed Simulation (PADS '99), pp. 118-125, 1999.
[33] F. Quaglia, A Cost Model for Selecting Checkpoint Positions in Time Warp Parallel Simulation IEEE Trans. Parallel and Distributed Systems, vol. 12, no. 4, pp. 346-362, Apr. 2001.
[34] F. Quaglia, A. Santoro, and B. Ciciani, Tuning of the Checkpointing and Communication Library for Optimistic Simulation on Myrinet Based NOWs Proc. Ninth Int'l Symp. Modeling, Analysis and Simulation of Computer and Telecomm. Systems (MASCOTS '01), pp. 241-248, 2001.
[35] R. Rönngren and R. Ayani, “Adaptive Checkpointing in Time Warp,” Proc. Eighth Workshop Parallel and Distributed Simulation (PADS 94), pp. 110-117, July 1994.
[36] R. Ronngren, M. Liljenstam, R. Ayani, and J. Montagnat, "Transparent Incremental State Saving in Time Warp Parallel Discrete Event Simulation," Proc. 10th Workshop Parallel and Distributed Simulation, pp. 70-77, 1996.
[37] S. Skold and R. Ronngren, Event Sensitive State Saving in Time Warp Parallel Discrete Event Simulations Proc. 1996 Winter Simulation Conf., 1996.
[38] S. Srinivasan and P.F. Reynolds Jr., Elastic Time ACM Trans. Modeling and Computer Simulation, vol. 8, no. 2, pp. 103-139, 1998.
[39] J. Steinman, Incremental State Saving in SPEEDES Using C Plus Plus Proc. 1993 Winter Simulation Conf., pp. 687-696, 1993.
[40] B.W. Unger, J.G. Cleary, A. Covington, and D. West, “External State Management System for Optimistic Parallel Simulation,” Proc. 1993 Winter Simulation Conf., pp. 750-755, 1993.
[41] D. West and K. Panesar, Automatic Incremental State Saving Proc. 10th Workshop Parallel and Distributed Simulation (PADS '96), pp. 78-85, 1996.
[42] F. Wieland, Practical Parallel Simulation Applied to Aviation Control Proc. 15th Workshop Parallel and Distributed Simulation (PADS '01), pp. 109-116, 2001.
[43] C.H. Young, N.B. Abu-Ghazaleh, and P.A. Wilsey, OFC: A Distributed Fossil-Collection Algorithm for Time Warp Proc. 12th Int'l Conf. Distributed Computing, pp. 408-418, 1998.

Index Terms:
Parallel discrete-event simulation, optimistic synchronization, checkpointing, myrinet, DMA, performance optimization.
Citation:
Francesco Quaglia, Andrea Santoro, "Nonblocking Checkpointing for Optimistic Parallel Simulation: Description and an Implementation," IEEE Transactions on Parallel and Distributed Systems, vol. 14, no. 6, pp. 593-610, June 2003, doi:10.1109/TPDS.2003.1206506
Usage of this product signifies your acceptance of the Terms of Use.