2008 IEEE Fourth International Conference on eScience (2008)
Dec. 7, 2008 to Dec. 12, 2008
There exists a class of scientific applications for which utilizing distributed resources is critical for reducing the time-to-solution. In this paper, we discuss a specific class of applications - Replica-Exchange simulations - where the orchestration of many distributed jobs in a dynamic and inherently unreliable distributed environment is essential for a successful completion. We describe the design, development and deployment of a unique framework for constructing fault-tolerant distributed simulations.
SAGA, Migol, Fault-Tolerance, Replica-Exchange, Grid Computing
Shantenu Jha, Andre Merzky, Joohyun Kim, Bettina Schnor, André Luckow, "Distributed Replica-Exchange Simulations on Production Environments Using SAGA and Migol", 2008 IEEE Fourth International Conference on eScience, vol. 00, no. , pp. 253-260, 2008, doi:10.1109/eScience.2008.20