2016 Seventh International Workshop on Data-Intensive Computing in the Clouds (DataCloud) (2016)
Salt Lake City, Utah, USA
Nov. 14, 2016 to Nov. 14, 2016
We present Asterism, an open source data-intensive framework, which combines the strengths of traditional workflow management systems with new parallel stream-based dataflow systems to run data-intensive applications across multiple heterogeneous resources, without users having to: re-formulate their methods according to different enactment engines; manage the data distribution across systems; parallelize their methods; co-place and schedule their methods with computing resources; and store and transfer large/small volumes of data. We also present the Data-Intensive workflows as a Service (DIaaS) model, which enables easy dataintensive workow composition and deployment on clouds using containers. The feasibility of Asterism and DIaaS model have been evaluated using a real domain application on the NSF-Chameleon cloud. Experimental results shows how Asterism successfully and efficiently exploits combinations of diverse computational platforms, whereas DIaaS delivers specialized software to execute data-intensive applications in a scalable, efficient, and robust way reducing the engineering time and computational cost.
Containers, Computational modeling, Engines, Data models, Storms, Parallel processing, Monitoring,deployment and reusability of execution environments, Data-Intensive science, scientific workows, stream-based system
Rosa Filgueira, Rafael Ferreira da Silva, Amrey Krause, Ewa Deelman, Malcolm Atkinson, "Asterism: Pegasus and Dispel4py Hybrid Workflows for Data-Intensive Science", 2016 Seventh International Workshop on Data-Intensive Computing in the Clouds (DataCloud), vol. 00, no. , pp. 1-8, 2016, doi:10.1109/DataCloud.2016.004