2018 IEEE International Conference on Cluster Computing (CLUSTER) (2018)
Belfast, United Kingdom
Sep 10, 2018 to Sep 13, 2018
We present techniques for optimizing the performance of data-intensive workflows that execute on geographically distributed and heterogeneous resources. We optimize for both throughput and response time. Optimizing for throughput, we alleviate data-transfer bottlenecks. To hide access times of accessing remote data, we transparently introduce prefetching (overlapping data transfer and computation), without changing workflow source code. Optimizing for response time, we introduce intelligent scheduling for a set of high-priority tasks. We replace a greedy scheduler that assigns tasks without accounting for differing performance on heterogeneous resources, leading to long latencies. Intelligent scheduling rapidly selects a near-optimal solution for a bi-objective optimization problem. One objective is a good task assignment; the other objective is minimize I/O contention by distributing load across resources and time. To reason about task completion times, we use modeling tools to generate accurate predictions of execution times. We show performance results for Belle II workflow for high energy physics. The combination of these techniques can improve throughput over production Belle II configurations by 20-40%. Our work is general and adaptable to other distributed workflows.
optimisation, scheduling, storage management
R. D. Friese, N. R. Tallent, M. Schram, M. Halappanavar and K. J. Barker, "Optimizing Distributed Data-Intensive Workflows," 2018 IEEE International Conference on Cluster Computing (CLUSTER), Belfast, United Kingdom, 2018, pp. 279-289.