2016 IEEE International Conference on Cloud Engineering (IC2E) (2016)
April 4, 2016 to April 8, 2016
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/IC2E.2016.21
Traffic for a typical MapReduce job in a data center consists of multiple network flows. Traditionally, network resources have been allocated to optimize network-level metrics such as flow completion time or throughput. Some recent schemes propose using application-aware scheduling which can shorten the average job completion time. However, most of them treat the core network as a black box with sufficient capacity. Even if only one network link in the core network becomes a bottleneck, it can hurt application performance. We design and implement a centralized flow-scheduling framework called Phurti with the goal of improving the completion time for jobs in a cluster shared among multiple Hadoop jobs (multi-tenant). Phurti communicates both with the Hadoop framework to retrieve job-level network traffic information and the OpenFlow-based switches to learn about the network topology. Phurti implements a novel heuristic called Smallest Maximum Sequential-traffic First (SMSF) that uses collected application and network information to perform traffic scheduling for MapReduce jobs. Our evaluation with real Hadoop workloads shows that compared to application and network-agnostic scheduling strategies, Phurti improves job completion time for 95% of the jobs, decreases average job completion time by 20%, tail job completion time by 13% and scales well with the cluster size and number of jobs.
Network topology, Bandwidth, Measurement, Data transfer, Topology, Scheduling algorithms, Schedules
C. X. Cai, S. Saeed, I. Gupta, R. H. Campbell and F. Le, "Phurti: Application and Network-Aware Flow Scheduling for Multi-tenant MapReduce Clusters," 2016 IEEE International Conference on Cloud Engineering (IC2E), Berlin, Germany, 2016, pp. 161-170.