This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
2012 IEEE Fifth International Conference on Cloud Computing
IncMR: Incremental Data Processing Based on MapReduce
Honolulu, HI, USA USA
June 24-June 29
ISBN: 978-1-4673-2892-0
MapReduce programming model is widely used for large scale and one-time data-intensive distributed computing, but lacks flexibility and efficiency of processing small incremental data. IncMR framework is proposed in this paper for incrementally processing new data of a large data set, which takes state as implicit input and combines it with new data. Map tasks are created according to new splits instead of entire splits while reduce tasks fetch their inputs including the state and the intermediate results of new map tasks from designate nodes or local nodes. Data locality is considered as one of the main optimization means for job scheduling. It is implemented based on Hadoop, compatible with the original MapReduce interfaces and transparent to users. Experiments show that non-iterative algorithms running in MapReduce framework can be migrated to IncMR directly to get efficient incremental and continuous processing without any modification. IncMR is competitive and in all studied cases runs faster than that processing the entire data set.
Index Terms:
Computational modeling,Data processing,Algorithm design and analysis,Data models,Programming,Parallel processing,Distributed databases,Compatible,MapReduce,Incremental data processing,State,Data locality
Citation:
Cairong Yan, Xin Yang, Ze Yu, Min Li, Xiaolin Li, "IncMR: Incremental Data Processing Based on MapReduce," cloud, pp.534-541, 2012 IEEE Fifth International Conference on Cloud Computing, 2012
Usage of this product signifies your acceptance of the Terms of Use.