2015 IEEE 35th International Conference on Distributed Computing Systems (ICDCS) (2015)
Columbus, OH, USA
June 29, 2015 to July 2, 2015
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDCS.2015.43
Data-parallel computing frameworks (DCF) such as MapReduce, Spark, and Dryad etc. Have tremendous applications in big data and cloud computing, and throw tons of flows into data center networks. In this paper, we design and implement FLOW PROPHET, a general framework to predict traffic flows for DCFs. To this end, we analyze and summarize the common features of popular DCFs, and gain a key insight: since application logic in DCFs is naturally expressed by directed acyclic graphs (DAG), DAG contains necessary time and data dependencies for accurate flow prediction. Based on the insight, FLOW PROPHET extracts DAGs from user applications, and uses the time and data dependencies to calculate flow information 4-tuple, (source, destination, flow size, establish time), ahead-of-time for all flows. We also provide generic programming interface to FLOW PROPHET, so that current and future DCFs can deploy FLOW PROPHET readily. We implement FLOW PROPHET on both Spark and Hadoop, and perform extensive evaluations on a testbed with 37 physical servers. Our implementation and experiments demonstrate that, with time in advance and minimal cost, FLOW PROPHET can achieve almost 100% accuracy in source, destination, and flow size predictions. With accurate prediction from FLOW PROPHET, the job completion time of a Hadoop TeraSort benchmark is reduced by 12.52% on our cluster with a simple network scheduler.
Calculators, Sparks, Optimization, Prediction algorithms, Parallel processing, Context, Data mining
H. Wang et al., "FLOWPROPHET: Generic and Accurate Traffic Prediction for Data-Parallel Cluster Computing," 2015 IEEE 35th International Conference on Distributed Computing Systems (ICDCS), Columbus, OH, USA, 2015, pp. 349-358.