An Asynchronous Two-Level Checkpointing Method to Solve Adjoint Problems on Hierarchical Memory Spaces
Issue No. 04 - Jul./Aug. (2018 vol. 20)
Debanjan Datta , University of Texas at Austin
David Appelhans , IBM Research
Constantinos Evangelinos , IBM Research
Kirk Jordan , IBM Research
The problem of data reversal in discretized adjoint problems is often solved using checkpointing, trading memory usage with computations and data movement. The authors present a useful model to design and implement an asynchronous two-level checkpointing method with parameterizable values for current and future system configurations. They also evaluate the benefits of new supercomputing hardware through the implementation of an asynchronous algorithm that takes advantage of the fast NVLINK interconnect and Non-Volatile Memory Express (NVMe) memory. They show that the new hardware combined with an asynchronous approach is able to run bigger simulations faster than current generation hardware.
Checkpointing, Nonvolatile memory, Memory management, Random access memory, Computational modeling, Supercomputers, Mathematical model
D. Datta, D. Appelhans, C. Evangelinos and K. Jordan, "An Asynchronous Two-Level Checkpointing Method to Solve Adjoint Problems on Hierarchical Memory Spaces," in Computing in Science & Engineering, vol. 20, no. 4, pp. 39-55, 2018.