2014 Sixth International Symposium on Parallel Architectures, Algorithms and Programming (PAAP) (2014)
July 13, 2014 to July 15, 2014
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/PAAP.2014.29
MapReduce is a distributed parallel computing framework for large-scale data processing with extensive applications. Hadoop MapReduce is the most widely employed open-source implementation of MapReduce framework for its flexible customization and simple usage. To avoid the relatively slow running task, called a straggler task, slowing down the job, MapReduce speculatively backups the straggler task on another node to execute aiming to reduce the job's finish time. Although there have been many speculative execution strate-gies in heterogeneous environments, they all do not consider the impact of dynamic system load on the running time of tasks. They may make mistakes in determining stragglers. In our paper, we propose a novel speculative execution strategy in heterogeneous environments, ERUL, to im-prove the estimation of tasks' rest time. ERUL also overcomes some drawbacks of LATE that mislead the speculative execution in some cases. The experimental result indicates that, our Hadoop-ERUL strategy not only works more accurately in the estimation of running tasks' remaining execution time, but also reduces 26% job's running time compared to Hadoop-LATE.
Load modeling, Estimation, Distributed databases, Data models, Heuristic algorithms, Educational institutions, Open source software
H. Wu, K. Li, Z. Tang and L. Zhang, "A Heuristic Speculative Execution Strategy in Heterogeneous Distributed Environments," 2014 Sixth International Symposium on Parallel Architectures, Algorithms and Programming (PAAP), Beijing, China, 2014, pp. 268-273.