2015 Ninth International Conference on Complex, Intelligent, and Software Intensive Systems (CISIS) (2015)
Santa Catarina, Brazil
July 8, 2015 to July 10, 2015
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/CISIS.2015.37
The Hadoop framework has been developed to effectively process data-intensive MapReduce applications. Hadoop users specify the application computation logic in terms of a map and a reduce function, which are often termed MapReduce applications. The Hadoop distributed file system is used to store the MapReduce application data on the Hadoop cluster nodes called Data nodes, whereas Name node is a control point for all Data nodes. While its resilience is increased, its current data-distribution methodologies are not necessarily efficient for heterogeneous distributed environments such as public clouds. This work contends that existing data distribution techniques are not necessarily suitable, since the performance of Hadoop typically degrades in heterogeneous environments whenever data-distribution is not determined as per the computing capability of the nodes. The concept of data-locality and its impact on the performance of Hadoop are key factors, since they affect the performance in the Map phase when scheduling tasks. The task scheduling techniques in Hadoop should arguably consider data locality to enhance performance. Various task scheduling techniques have been analysed to understand their data-locality awareness while scheduling applications. Other system factors also play a major role while achieving high performance in Hadoop data processing. The main contribution of this work is a novel methodology for data placement for Hadoop Data nodes based on their computing ratio. Two standard MapReduce applications, Word Count and Grep, have been executed and a significant performance improvement has been observed based on our proposed data distribution technique.
Time factors, Distributed databases, Processor scheduling, Bandwidth, Random access memory, Cloud computing, Data processing
V. Ubarhande, A. Popescu and H. Gonzalez-Velez, "Novel Data-Distribution Technique for Hadoop in Heterogeneous Cloud Environments," 2015 Ninth International Conference on Complex, Intelligent, and Software Intensive Systems (CISIS), Santa Catarina, Brazil, 2015, pp. 217-224.