2015 IEEE International Parallel and Distributed Processing Symposium (IPDPS) (2015)
May 25, 2015 to May 29, 2015
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/IPDPS.2015.67
The scale of high performance computing (HPC) systems is exponentially growing, potentially causing prohibitive shrinkage of mean time between failures (MTBF) while the overall increase in the I/O performance of parallel file systems will be far behind the increase in scale. As such, there have been various attempts to decrease the checkpoint overhead, one of which is to employ compression techniques to the checkpoint files. While most of the existing techniques focus on lossless compression, their compression rates and thus effectiveness remain rather limited. Instead, we propose a loss compression technique based on wavelet transformation for checkpoints, and explore its impact to application results. Experimental application of our loss compression technique to a production climate application, NICAM, shows that the overall checkpoint time including compression is reduced by 81%, while relative error remains fairly constant at approximately 1.2% on overall average of all variables of compressed physical quantities compared to original checkpoint without compression.
Quantization (signal), Arrays, Wavelet transforms, Image coding, Data models, Checkpointing, Computational modeling
N. Sasaki, K. Sato, T. Endo and S. Matsuoka, "Exploration of Lossy Compression for Application-Level Checkpoint/Restart," 2015 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Hyderabad, India, 2015, pp. 914-922.