|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
Second IEEE International Conference on e-Science and Grid Computing (e-Science'06)
Practical Fault-Tolerant Framework for eScience Infrastructure
Amsterdam, Netherlands
December 04-December 06
ISBN: 0-7695-2734-5
| ASCII Text | x | ||
| Hyuck Han, Jai Wug Kim, Jongpil Lee, Young Jin Yu, Kiyoung Kim, Heon Y. Yeom, "Practical Fault-Tolerant Framework for eScience Infrastructure," 2012 IEEE 8th International Conference on E-Science, pp. 57, Second IEEE International Conference on e-Science and Grid Computing (e-Science'06), 2006. | |||
| BibTex | x | ||
| @article{ 10.1109/E-SCIENCE.2006.109, author = {Hyuck Han and Jai Wug Kim and Jongpil Lee and Young Jin Yu and Kiyoung Kim and Heon Y. Yeom}, title = {Practical Fault-Tolerant Framework for eScience Infrastructure}, journal ={2012 IEEE 8th International Conference on E-Science}, volume = {0}, year = {2006}, isbn = {0-7695-2734-5}, pages = {57}, doi = {http://doi.ieeecomputersociety.org/10.1109/E-SCIENCE.2006.109}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - CONF JO - 2012 IEEE 8th International Conference on E-Science TI - Practical Fault-Tolerant Framework for eScience Infrastructure SN - 0-7695-2734-5 SP EP A1 - Hyuck Han, A1 - Jai Wug Kim, A1 - Jongpil Lee, A1 - Young Jin Yu, A1 - Kiyoung Kim, A1 - Heon Y. Yeom, PY - 2006 KW - null VL - 0 JA - 2012 IEEE 8th International Conference on E-Science ER - | |||
Many areas of science currently use computing resources as a important part of their research, and many research groups adopt cluster architecture to use them efficiently and manage them easily. Therefore, faulttolerance becomes a very important property for the computing resources. However, fault-tolerant systems have not yet been widely adopted because they are either hard to deploy, hard to use, hard to manage, hard to maintain, or hard to justify.
This paper proposes a practical fault-tolerant system for eScience infrastructures. Our system uses checkpoint/ restart mechanism for fault-tolerance, and provides a easy mechanism to integrate with Grid services widely used in eScience. Additionally, we run rigorous tests using scientific applications to verify that our system can be used in clusters. We also describe improvements made to our system to solve various problems that arose when deploying it on a cluster. The experimental results show that not only does our system conform to various types of running environment well, but that it can also be practically deployed in clusters.
Citation:
Hyuck Han, Jai Wug Kim, Jongpil Lee, Young Jin Yu, Kiyoung Kim, Heon Y. Yeom, "Practical Fault-Tolerant Framework for eScience Infrastructure," e-science, pp.57, Second IEEE International Conference on e-Science and Grid Computing (e-Science'06), 2006
Usage of this product signifies your acceptance of the Terms of Use.
