loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
2008 22nd International Symposium on High Performance Computing Systems and Applications
A Framework for Executing Long Running Jobs in Grid Environments
June 09-June 11
ISBN: 978-0-7695-3250-9
Computational jobs that take days, weeks or months to run usually cannot be executed as a single job due to system failures and scheduling constraints. Instead the job must be split into a series of shorter jobs. Solutions for managing the execution of such jobs in grid environments must address many issues. Participating systems and their properties can change over time and therefore it is important to have dynamic resource discovery mechanisms. Data management tools are needed to manage and keep track of data that can be distributed across multiple sites. Fault tolerance is required to handle the many different errors and failures that can occur in such environments. Furthermore, support for job reconfiguration, in terms of the number of processors, run length, and memory required, is necessary to allow jobs to adapt to the heterogeneous resources they are submitted to. This paper presents a framework for executing long running jobs in grid environments that addresses the above issues.??The framework automates checkpointing, migration and reconfiguration of jobs. It has been successfully tested with the GROMACS molecular dynamics simulation application in a GT4-based grid environment comprised of resources distributed across Canada.
Index Terms:
Execution Framework, Adaptive Scheduling, Grid Computing
Citation:
Nayden Markatchev, Cameron Kiddle, Rob Simmonds, "A Framework for Executing Long Running Jobs in Grid Environments," hpcs, pp.69-75, 2008 22nd International Symposium on High Performance Computing Systems and Applications, 2008
Usage of this product signifies your acceptance of the Terms of Use.