This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
A Cost Model for Selecting Checkpoint Positions in Time Warp Parallel Simulation
April 2001 (vol. 12 no. 4)
pp. 346-362

Abstract—Recent papers have shown that the performance of Time Warp simulators can be improved by appropriately selecting the positions of checkpoints, instead of taking them on a periodic basis. In this paper, we present a checkpointing technique in which the selection of the positions of checkpoints is based on a checkpointing-recovery cost model. Given the current state $S$, the model determines the convenience of recording $S$ as a checkpoint before the next event is executed. This is done by taking into account the position of the last taken checkpoint, the granularity (i.e., the execution time) of intermediate events, and using an estimate of the probability that $S$ will have to be restored due to rollback in the future of the execution. A synthetic benchmark in different configurations is used for evaluating and comparing this approach to classical periodic techniques. As a testing environment we used a cluster of PCs connected through a Myrinet switch coupled with a fast communication layer specifically designed to exploit the potential of this type of switch. The obtained results point out that our solution allows faster execution and, in some cases, exhibits the additional advantage that less memory is required for recording state vectors. This possibly contributes to further performance improvements when memory is a critical resource for the specific application. A performance study for the case of a cellular phone system simulation is finally reported to demonstrate the effectiveness of this solution for a real world application.

[1] L.R.G. Auriche, F. Quaglia, and B. Ciciani, “Run-Time Selection of the Checkpoint Interval in Time Warp Based Simulations,” Simulation Practice and Theory, vol. 6, no. 5, pp. 461-478, July 1998.
[2] H. Bauer and C. Sporrer, “Reducing Rollback Overhead in Time Warp Based Distributed Simulation with Optimized Incremental State Saving,” Proc. 26th Ann. Simulation Symp., pp. 12-20, 1993.
[3] A. Ferscha and J. Luthi, “Estimating Rollback Overhead for Optimism Control in Time Warp,” Proc. 28th Ann. Simulation Symp., pp. 2-12, 1995.
[4] J. Fleischmann and P.A. Wilsey, “Comparative Analysis of Periodic State Saving Techniques in TimeWarp Simulators,” Proc. Ninth Workshop Parallel and Distributed Simulation (PADS '95), pp. 50-58, June 1995.
[5] S. Franks, F. Gomes, B. Unger, and J. Cleary, "State Saving for Interactive Optimistic Simulation," Proc. 11th Workshop Parallel and Distributed Simulation, pp. 72-79, 1997.
[6] R. Fujimoto, “Parallel Discrete Event Simulation,” Comm. ACM, vol. 33, no. 10, pp. 30-53, Oct. 1990.
[7] R.M. Fujimoto, “Performance of Time Warp under Synthetic Workloads,” Proc. Multiconference Distributed Simulation, vol. 22, no. 1, Jan. 1990.
[8] A. Gafni, “Space Management and Cancellation Mechanisms for Time Warp,” Technical Report TR-85-341, Univ. of Southern California, Los Angeles.
[9] F. Gomes, B.W. Unger, S. Franks, and J. Cleary, “Multiplexed State Saving for Bounded Rollback,” Proc. 1997 Winter Simulation Conf., pp. 460-467, Dec. 1997.
[10] D.R. Jefferson, "Virtual Time," ACM Trans. Programming Languages and Systems, vol. 7, no. 3, pp. 404-425, July 1985.
[11] Y.B. Lin and E.D. Lazowska, “Processor Scheduling for Time Warp Parallel Simulation,” Advances in Parallel and Distributed Simulation, pp. 11-14, 1991.
[12] Y.B. Lin, B.R. Preiss, W.M. Loucks, and E.D. Lazowska, “Selecting the Checkpoint Interval in Time Warp Simulation,” Proc. Seventh Workshop Parallel and Distributed Simulation, pp. 3-10, 1993.
[13] A.C. Palaniswamy and P.A. Wilsey, “Adaptive Checkpoint Intervals in an Optimistically Synchronized Parallel Digital System Simulator,” Proc. IFIP TC/WG10. Fifth Int'l. Conf. Very Large Scale Integration, pp. 353-362, Sept. 1993.
[14] A.C. Palaniswamy and P.A. Wilsey, “An Analytical Comparison of Periodic Checkpointing and Incremental State Saving,” Proc. Seventh Workshop Parallel and Distributed Systems, pp. 127-134, 1993.
[15] B.R. Preiss, W. Loucks, and I. MacIntyre, “Effects of the Checkpoint Interval on Time and Space in Time Warp,” ACM Trans. Modeling and Computer Simulation, July 1994.
[16] F. Quaglia and V. Cortellessa, “Rollback-Based Parallel Discrete Event Simulation by Using Hybrid State Saving,” Proc. Ninth European Simulation Symp., pp. 275-279, Oct. 1997
[17] F. Quaglia, Event History Based Sparse State Saving in Time Warp Proc. 12th Workshop Parallel and Distributed Simulation (PADS '98), pp. 72-79, 1998.
[18] F. Quaglia, “Combining Periodic and Probabilistic Checkpointing in Optimistic Simulation,” Proc. 13th Workshop Parallel and Distributed Systems (PADS '99), pp. 109-116, May 1999.
[19] F. Quaglia, Fast-Software-Checkpointing in Optimistic Simulation: Embedding State Saving into the Event Routine Instructions Proc. 13th Workshop Parallel and Distributed Simulation (PADS '99), pp. 118-125, 1999.
[20] R. Rönngren and R. Ayani, “Adaptive Checkpointing in Time Warp,” Proc. Eighth Workshop Parallel and Distributed Simulation (PADS 94), pp. 110-117, July 1994.
[21] R. Ronngren, M. Liljenstam, and J. Montagnat, "A Comparative Study of State Saving Mechanisms for Time Warp Synchronized Parallel Discrete Event Simulation," Proc. 29th Ann. Simulation Symp., pp. 5-14, 1996.
[22] S. Skold and R. Ronngren, Event Sensitive State Saving in Time Warp Parallel Discrete Event Simulations Proc. 1996 Winter Simulation Conf., 1996.
[23] H.M. Soliman and A.S. Elmaghraby, “An Analytical Model for Hybrid Checkpointing in Time Warp Distributed Simulation,” IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 10, pp. 947-951, Oct. 1998.
[24] J. Steinman, Incremental State Saving in SPEEDES Using C Plus Plus Proc. 1993 Winter Simulation Conf., pp. 687-696, 1993.
[25] K. Umamageswaran, K. Subramani, P.A. Wilsey, and P. Alexander, “Formal Verification and Empirical Analysis of Rollback Relaxation,” J. Systems Architecture, vol. 44, pp. 473-495, 1998.
[26] B.W. Unger, J.G. Cleary, A. Covington, and D. West, “External State Management System for Optimistic Parallel Simulation,” Proc. 1993 Winter Simulation Conf., pp. 750-755, 1993.

Index Terms:
Parallel discrete-event simulation, checkpointing, rollback-recovery, time warp, optimistic synchronization, performance optimization, cost models.
Citation:
Francesco Quaglia, "A Cost Model for Selecting Checkpoint Positions in Time Warp Parallel Simulation," IEEE Transactions on Parallel and Distributed Systems, vol. 12, no. 4, pp. 346-362, April 2001, doi:10.1109/71.920586
Usage of this product signifies your acceptance of the Terms of Use.