WA-Dataspaces: Exploring the Data Staging Abstractions for Wide-Area Distributed Scientific Workflows
2017 46th International Conference on Parallel Processing (ICPP) (2017)
Bristol, United Kingdom
Aug. 14, 2017 to Aug. 17, 2017
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICPP.2017.34
Data staging has been shown to be very effective for supporting data intensive in-situ workflows and coupling of applications. Experimental sciences are increasingly becoming collaborative among geographically distributed teams, and include experimental instruments and HPC facilities. This new way of doing science poses new challenges due to data sizes, complexity of computation, and the use of wide area networks between couplings. In this paper, we explore how the staging abstraction can be extended to support such workflows. Specifically, we develop a NUMA-like abstraction that orchestrates multiple distributed local-area staging abstractions, and provides asynchronous data put/get semantics to enable data sharing across them. To mask data movement overhead and provide in-time data access, we propose the use of predictive prefetching approaches that leverage the iterative nature of the coupling. We evaluate our prototype implementation using a fusion workflow and show that our design can effectively and transparently support widearea coupled workflows. Additionally, results show that the use of prefetching techniques leads to significant gains in data access times of data that needs to be moved over the wide area network.
Distributed databases, Couplings, Prefetching, Wide area networks, Peer-to-peer computing, Data models, Semantics
M. F. Aktas, J. Diaz-Montes, I. Rodero and M. Parashar, "WA-Dataspaces: Exploring the Data Staging Abstractions for Wide-Area Distributed Scientific Workflows," 2017 46th International Conference on Parallel Processing (ICPP), Bristol, United Kingdom, 2017, pp. 251-260.