2013 International Conference on Parallel and Distributed Systems (ICPADS) (2013)
Seoul, Korea (South)
Dec. 15, 2013 to Dec. 18, 2013
The data placement strategy greatly affects the efficiency of MapReduce. The current strategy only takes the map phase into account to optimize the map time. But the ignored shuffle phase may increase the total running time significantly in many jobs. We propose a new data placement strategy, named OPTAS, which optimizes both the map and shuffle phases to reduce their total time. However, the huge search space makes it difficult to find out an optimal data placement instance (DPI) rapidly. To address this problem, an algorithm is proposed which can prune most of the search space and find out an optimal result quickly. The search space firstly is segmented in ascending order according to the potential map time. Within each segment, we propose an efficient method to construct a local optimal DPI with the minimal total time of both the map and shuffle phases. To find the global optimal DPI, we scan the local optimal DPIs in order. We have proven that the global optimal DPI can be found as the first local optimal DPI whose total time stops decreasing, thus further pruning the search space. In practice, we find that at most fourteen local optimal DPIs are scanned in tens of thousands of segments with the pruning strategy. Extensive experiments with real trace data verify not only the theoretic analysis of our pruning strategy and construction method but also the optimality of OPTAS. The best improvements obtained in our experiments can be over 40% compared with the existing strategy used by MapReduce.
Optimized production technology, Distributed databases, Conferences, Analytical models, Data models, Algorithm design and analysis, Educational institutions
C. Wang, Y. Qin, Z. Huang, Y. Peng, D. Li and H. Li, "OPTAS: Optimal Data Placement in MapReduce," 2013 International Conference on Parallel and Distributed Systems (ICPADS), Seoul, Korea (South), 2013, pp. 315-322.