2008 IEEE Fourth International Conference on eScience (2008)
Dec. 7, 2008 to Dec. 12, 2008
There exists a class of scientific applications for which utilizing distributed resources is critical for reducing the time-to-solution. In this paper, we discuss a specific class of applications - Replica-Exchange simulations - where the orchestration of many distributed jobs in a dynamic and inherently unreliable distributed environment is essential for a successful completion. We describe the design, development and deployment of a unique framework for constructing fault-tolerant distributed simulations.
SAGA, Migol, Fault-Tolerance, Replica-Exchange, Grid Computing
S. Jha, A. Merzky, J. Kim, B. Schnor and A. Luckow, "Distributed Replica-Exchange Simulations on Production Environments Using SAGA and Migol," 2008 IEEE Fourth International Conference on eScience(ESCIENCE), Indianapolis, IN, 2008, pp. 253-260.