|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
Fourth IEEE International Symposium on Cluster Computing and the Grid (CCGrid'04)
Hybrid checkpointing for parallel applications in cluster federations
Chicago, IL, USA
April 19-April 22
ISBN: 0-7803-8430-X
| ASCII Text | x | ||
| S. Monnet, C. Morin, R. Badrinath, "Hybrid checkpointing for parallel applications in cluster federations," Cluster Computing and the Grid, IEEE International Symposium on, pp. 773-782, Fourth IEEE International Symposium on Cluster Computing and the Grid (CCGrid'04), 2004. | |||
| BibTex | x | ||
| @article{ 10.1109/CCGrid.2004.1336712, author = {S. Monnet and C. Morin and R. Badrinath}, title = {Hybrid checkpointing for parallel applications in cluster federations}, journal ={Cluster Computing and the Grid, IEEE International Symposium on}, volume = {0}, year = {2004}, isbn = {0-7803-8430-X}, pages = {773-782}, doi = {http://doi.ieeecomputersociety.org/10.1109/CCGrid.2004.1336712}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - CONF JO - Cluster Computing and the Grid, IEEE International Symposium on TI - Hybrid checkpointing for parallel applications in cluster federations SN - 0-7803-8430-X SP773 EP782 A1 - S. Monnet, A1 - C. Morin, A1 - R. Badrinath, PY - 2004 VL - 0 JA - Cluster Computing and the Grid, IEEE International Symposium on ER - | |||
Cluster federations are attractive for executing applications like large scale code coupling. However faults may appear frequently in such architectures. Thus, checkpointing long-running applications is desirable to avoid to restart them from the beginning in the event of a node failure. To take into account the constraints of a cluster federation architecture, an hybrid checkpointing protocol is proposed. It uses global coordinated checkpointing inside clusters but only quasi-synchronous checkpointing techniques between clusters. The proposed protocol has been evaluated by simulation and fits well for applications that can be divided into modules with lots of communications within modules but few between them.
Citation:
S. Monnet, C. Morin, R. Badrinath, "Hybrid checkpointing for parallel applications in cluster federations," ccgrid, pp.773-782, Fourth IEEE International Symposium on Cluster Computing and the Grid (CCGrid'04), 2004
Usage of this product signifies your acceptance of the Terms of Use.
