Reducing Time-to-Solution Using Distributed High-Throughput Mega-Workflows - Experiences from SCEC CyberShake
2008 IEEE Fourth International Conference on eScience (2008)
Dec. 7, 2008 to Dec. 12, 2008
Researchers at the Southern California Earthquake Center (SCEC) use large-scale grid-based scientific workflows to perform seismic hazard research as a part of SCEC's program of earthquake system science research. The scientific goal of the SCEC CyberShake project is to calculate probabilistic seismic hazard curves for sites in Southern California. For each site of interest, the CyberShake platform includes two large-scale MPI calculations and approximately 840,000 embarrassingly parallel post-processing jobs. In this paper, we describe the computational requirements of CyberShake and detail how we meet these requirements using grid-based, high-throughput, scientific workflow tools. We describe the specific challenges we encountered and we discuss workflow throughput optimizations we developed that reduced our time to solution by a factor of three and we present runtime statistics and propose further optimizations.
scientific workflows, high throughput, Globus, Pegasus, Condor, optimization, seismology, TeraGrid
Thomas Jordan, Scott Callaghan, Edward Field, Gideon Juve, Ewa Deelman, Gaurang Mehta, Karan Vahi, Keith Beattie, Dan Gunter, Robert Graves, Philip Maechling, Kevin Milner, David Okaya, "Reducing Time-to-Solution Using Distributed High-Throughput Mega-Workflows - Experiences from SCEC CyberShake", 2008 IEEE Fourth International Conference on eScience, vol. 00, no. , pp. 151-158, 2008, doi:10.1109/eScience.2008.60