2013 IEEE Seventh International Symposium on Service-Oriented System Engineering (2013)
San Francisco, CA, USA USA
Mar. 25, 2013 to Mar. 28, 2013
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/SOSE.2013.64
Data skew is an important reason for the emergence of stragglers in MapReduce-like cloud systems. In this paper, we propose a Skew-Aware Task Scheduling (SATS) mechanism for iterative applications in MapReduce-like systems. The mechanism utilizes the similarity of data distribution in adjacent iterations of iterative applications to reduce the straggle problem caused by data skew. It collects the data distribution information during the execution of tasks for the current iteration, and uses the information to guide data partitioning in tasks for the next iteration. We implement the mechanism in the HaLoop system and deploy it in a cluster. Experiments show that the proposed mechanism could deal with the data skew and improve the load balancing effectively.
Load balancing, Data Skew, Task Scheduling, Cloud
D. Li, Y. Chen and R. H. Hai, "Skew-Aware Task Scheduling in Clouds," 2013 IEEE 7th International Symposium on Service Oriented System Engineering (SOSE 2013)(SOSE), Redwood City, 2013, pp. 341-346.