Honolulu, HI, USA USA
June 24, 2012 to June 29, 2012
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/CLOUD.2012.67
MapReduce programming model is widely used for large scale and one-time data-intensive distributed computing, but lacks flexibility and efficiency of processing small incremental data. IncMR framework is proposed in this paper for incrementally processing new data of a large data set, which takes state as implicit input and combines it with new data. Map tasks are created according to new splits instead of entire splits while reduce tasks fetch their inputs including the state and the intermediate results of new map tasks from designate nodes or local nodes. Data locality is considered as one of the main optimization means for job scheduling. It is implemented based on Hadoop, compatible with the original MapReduce interfaces and transparent to users. Experiments show that non-iterative algorithms running in MapReduce framework can be migrated to IncMR directly to get efficient incremental and continuous processing without any modification. IncMR is competitive and in all studied cases runs faster than that processing the entire data set.
Computational modeling, Data processing, Algorithm design and analysis, Data models, Programming, Parallel processing, Distributed databases, Compatible, MapReduce, Incremental data processing, State, Data locality
Cairong Yan, Xin Yang, Ze Yu, Min Li, Xiaolin Li, "IncMR: Incremental Data Processing Based on MapReduce", CLOUD, 2012, 2013 IEEE Sixth International Conference on Cloud Computing, 2013 IEEE Sixth International Conference on Cloud Computing 2012, pp. 534-541, doi:10.1109/CLOUD.2012.67