Cluster Computing and the Grid, IEEE International Symposium on (2007)
Rio De Janeiro, Brazil
May 14, 2007 to May 17, 2007
Michael Klemm , University of Erlangen-Nuremberg
Matthias Bezold , University of Erlangen-Nuremberg
Stefan Gabriel , University of Erlangen-Nuremberg
Ronald Veldema , University of Erlangen-Nuremberg
Michael Philippsen , University of Erlangen-Nuremberg
Typical computational grid users target only a single cluster and have to estimate the runtime of their jobs. Job schedulers prefer short-running jobs to maintain a high system utilization. If the user underestimates the runtime, premature termination causes computation loss; overestimation is penalized by long queue times. As a solution, we present an automatic reparallelization and migration of OpenMP applications. A reparallelization is dynamically computed for an OpenMP work distribution when the number of CPUs changes. The application can be migrated between clusters when an allocated time slice is exceeded. Migration is based on a coordinated, heterogeneous checkpointing algorithm. Both reparallelization and migration enable the user to freely use computing time at more than a single point of the grid. Our demo applications successfully adapt to the changed CPU setting and smoothly migrate between, for example, clusters in Erlangen, Germany, and Amsterdam, the Netherlands, that use different processors. Benchmarks show that reparallelization and migration impose average overheads of about 4% and 2%.
M. Bezold, S. Gabriel, R. Veldema, M. Klemm and M. Philippsen, "Reparallelization and Migration of OpenMP Programs," Cluster Computing and the Grid, IEEE International Symposium on(CCGRID), Rio De Janeiro, Brazil, 2007, pp. 529-540.