Parallel Computing in Electrical Engineering, 2004. International Conference on (2004)
Sept. 7, 2004 to Sept. 10, 2004
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/PCEE.2004.72
Pawel Czarnul , Gdansk University of Technology, Poland
Arkadiusz Urbaniak , Gdansk University of Technology, Poland
Marcin Fraczak , Gdansk University of Technology, Poland
Maciej Dyczkowski , Wroclaw University of Technology
Bartlomiej Balcerek , Wroclaw University of Technology
While there exist many kernel and user level libraries/systems which support checkpointing working processes and resuming their operations, it is still very difficult to provide an easy-to-use tool to assist checkpointing parallel applications. In this work, we aim at the development of an easy-to-use user-guided library to support checkpointing parallel MPI applications to be executed within the CLUSTERIX environment i.e. a collection of distributed HPC clusters. We propose a programmer-assisted approach with process state packing and unpacking at the code level for SPMD HPC applications. Although the library is in its early stage of development we present checkpoint/restart times and application execution (interrupted by checkpointing) times for the proposed approach compared to the same application linked with the ckpt user level library.
Process Checkpointing, Checkpointing Parallel Applications, Parallel Software Environments
A. Urbaniak, P. Czarnul, M. Dyczkowski, M. Fraczak and B. Balcerek, "Towards Easy-to-Use Checkpointing of MPI Applications within CLUSTERIX," Parallel Computing in Electrical Engineering, 2004. International Conference on(PARELEC), Dresden, Germany, 2004, pp. 390-393.