2014 Second International Conference on Advanced Cloud and Big Data (CBD) (2014)
Nov. 20, 2014 to Nov. 22, 2014
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/CBD.2014.57
Hadoop as a popular open-source implementation of MapReduce is widely used for large scale data-intensive applications like data mining, web indexing and scientific computing. The current Hadoop implementation assumes that nodes in a cluster are homogeneous in nature, and Hadoop distributed file system(HDFS) distributes data to multiple nodes based on disk space availability. Such data placement strategy is very efficient for homogeneous environments, where nodes are identical in terms of both computing power and disk capacity. Unfortunately, in practice, the homogeneity assumptions do not always hold. Hadoop's scheduler will lead to severe performance degradation and energy dissipation in heterogeneous environments by using default data placement strategy of HDFS. In this paper, we propose a novel snakelike data placement mechanism (SLDP) for large-scale heterogeneous Hadoop cluster. SLDP adopts a heterogeneity aware algorithm to divide various nodes into several virtual storage tiers(VST) firstly, and then places data blocks across nodes in each VST circuitously according to the hotness of data. Furthermore, SLDP uses a hotness proportional replication to reduce disk space consumption and also has an effective power control function. Experimental results on two real data-intensive applications show that SLDP is energy-efficient, space-saving and able to improve MapReduce performance in a heterogeneous Hadoop cluster significantly.
Peer-to-peer computing, Servers, Big data, Clustering algorithms, Distributed databases, Google, Power demand
R. Xiong, J. Luo and F. Dong, "SLDP: A Novel Data Placement Strategy for Large-Scale Heterogeneous Hadoop Cluster," 2014 Second International Conference on Advanced Cloud and Big Data (CBD), Huangshan, China, 2014, pp. 9-17.