An Asynchronous Two-Level Checkpointing Method to Solve Adjoint Problems on Hierarchical Memory Spaces
Issue No. 04 - Jul./Aug. (2018 vol. 20)
Debanjan Datta , University of Texas at Austin
David Appelhans , IBM Research
Constantinos Evangelinos , IBM Research
Kirk Jordan , IBM Research
The problem of data reversal in discretized adjoint problems is often solved using checkpointing, trading memory usage with computations and data movement. The authors present a useful model to design and implement an asynchronous two-level checkpointing method with parameterizable values for current and future system configurations. They also evaluate the benefits of new supercomputing hardware through the implementation of an asynchronous algorithm that takes advantage of the fast NVLINK interconnect and Non-Volatile Memory Express (NVMe) memory. They show that the new hardware combined with an asynchronous approach is able to run bigger simulations faster than current generation hardware.
checkpointing, flash memories, parallel machines, peripheral interfaces, random-access storage
D. Datta, D. Appelhans, C. Evangelinos and K. Jordan, "An Asynchronous Two-Level Checkpointing Method to Solve Adjoint Problems on Hierarchical Memory Spaces," in Computing in Science & Engineering, vol. 20, no. 4, pp. 39-55, 2018.