The Community for Technology Leaders
2010 IEEE 26th International Conference on Data Engineering (ICDE 2010) (2010)
Long Beach, CA, USA
Mar. 1, 2010 to Mar. 6, 2010
ISBN: 978-1-4244-5445-7
pp: 681-684
Kristi Morton , Computer Science and Engineering Department, University of Washington, Seattle, USA
Abram Friesen , Computer Science and Engineering Department, University of Washington, Seattle, USA
Magdalena Balazinska , Computer Science and Engineering Department, University of Washington, Seattle, USA
Dan Grossman , Computer Science and Engineering Department, University of Washington, Seattle, USA
ABSTRACT
In parallel query-processing environments, accurate, time-oriented progress indicators could provide much utility given that inter- and intra-query execution times can have high variance. However, none of the techniques used by existing tools or available in the literature provide non-trivial progress estimation for parallel queries. In this paper, we introduce Parallax, the first such indicator. While several parallel data processing systems exist, the work in this paper targets environments where queries consist of a series of MapReduce jobs. Parallax builds on recently-developed techniques for estimating the progress of single-site SQL queries, but focuses on the challenges related to parallelism and variable execution speeds. We have implemented our estimator in the Pig system and demonstrate its performance through experiments with the PigMix benchmark and other queries running in a real, small-scale cluster.
INDEX TERMS
CITATION

D. Grossman, M. Balazinska, K. Morton and A. Friesen, "Estimating the progress of MapReduce pipelines," 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010)(ICDE), Long Beach, CA, USA, 2010, pp. 681-684.
doi:10.1109/ICDE.2010.5447919
84 ms
(Ver 3.3 (11022016))