2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) (2018)
Washington, DC, USA
May 1, 2018 to May 4, 2018
With the rapid deployment of cloud infrastructures around the globe and the economic benefit of cloud-based computing and storage services, an increasing number of scientific workflows have been shifted or are in active transition to clouds. As the scale of scientific applications continues to grow, it is now common to deploy data-and network-intensive computing workflows across multi-clouds, where inter-cloud data transfer has a significant impact on both workflow performance and financial cost. We construct rigorous mathematical models to analyze intra-and inter-cloud execution dynamics of scientific workflows and formulate a budget-constrained workflow mapping problem to optimize the network performance of MapReduce-based scientific workflows in Hadoop systems in multi-cloud environments. We show this problem to be NP-complete and design a heuristic solution that takes into consideration module execution, data transfer, and I/O operations. The performance superiority of the proposed mapping solution over existing methods is illustrated through extensive simulations and further verified by real-life workflow experiments deployed in public clouds. We observe about 15% discrepancy between our theoretical estimates and real-world experimental measurements, which validates the correctness of our cost models and also ensures accurate workflow mapping in real systems.
cloud computing, computational complexity, data handling, parallel processing, workflow management software
H. Cao and C. Q. Wu, "Performance Optimization of Budget-Constrained MapReduce Workflows in Multi-Clouds," 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), Washington, DC, USA, 2018, pp. 243-252.