2016 Seventh International Workshop on Data-Intensive Computing in the Clouds (DataCloud) (2016)
Salt Lake City, Utah, USA
Nov. 14, 2016 to Nov. 14, 2016
We present Asterism, an open source data-intensive framework, which combines the strengths of traditional workflow management systems with new parallel stream-based dataflow systems to run data-intensive applications across multiple heterogeneous resources, without users having to: re-formulate their methods according to different enactment engines; manage the data distribution across systems; parallelize their methods; co-place and schedule their methods with computing resources; and store and transfer large/small volumes of data. We also present the Data-Intensive workflows as a Service (DIaaS) model, which enables easy dataintensive workow composition and deployment on clouds using containers. The feasibility of Asterism and DIaaS model have been evaluated using a real domain application on the NSF-Chameleon cloud. Experimental results shows how Asterism successfully and efficiently exploits combinations of diverse computational platforms, whereas DIaaS delivers specialized software to execute data-intensive applications in a scalable, efficient, and robust way reducing the engineering time and computational cost.
Containers, Computational modeling, Engines, Data models, Storms, Parallel processing, Monitoring
Rosa Filgueira, Rafael Ferreira da Silva, Amrey Krause, Ewa Deelman, Malcolm Atkinson, "Asterism: Pegasus and Dispel4py Hybrid Workflows for Data-Intensive Science", 2016 Seventh International Workshop on Data-Intensive Computing in the Clouds (DataCloud), vol. 00, no. , pp. 1-8, 2016, doi:10.1109/DataCloud.2016.004