The Community for Technology Leaders
2015 IEEE International Parallel and Distributed Processing Symposium (IPDPS) (2015)
Hyderabad, India
May 25, 2015 to May 29, 2015
ISSN: 1530-2075
ISBN: 978-1-4799-8649-1
pp: 914-922
The scale of high performance computing (HPC) systems is exponentially growing, potentially causing prohibitive shrinkage of mean time between failures (MTBF) while the overall increase in the I/O performance of parallel file systems will be far behind the increase in scale. As such, there have been various attempts to decrease the checkpoint overhead, one of which is to employ compression techniques to the checkpoint files. While most of the existing techniques focus on lossless compression, their compression rates and thus effectiveness remain rather limited. Instead, we propose a loss compression technique based on wavelet transformation for checkpoints, and explore its impact to application results. Experimental application of our loss compression technique to a production climate application, NICAM, shows that the overall checkpoint time including compression is reduced by 81%, while relative error remains fairly constant at approximately 1.2% on overall average of all variables of compressed physical quantities compared to original checkpoint without compression.
Quantization (signal), Arrays, Wavelet transforms, Image coding, Data models, Checkpointing, Computational modeling

N. Sasaki, K. Sato, T. Endo and S. Matsuoka, "Exploration of Lossy Compression for Application-Level Checkpoint/Restart," 2015 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Hyderabad, India, 2015, pp. 914-922.
160 ms
(Ver 3.3 (11022016))