The Community for Technology Leaders
2014 IEEE 7th International Conference on Cloud Computing (CLOUD) (2014)
Anchorage, AK, USA
June 27, 2014 to July 2, 2014
ISSN: 2159-6190
ISBN: 978-1-4799-5062-1
pp: 594-601
ABSTRACT
With the rapid development of big data and cloud computing, big data analytics as a service in the cloud is becoming increasingly popular. More and more individuals and organizations tend to rent virtual cluster to store and analyze data rather than building their own data centers. However, in virtualization environment, whether scaling out using a cluster with more nodes to process big data is better than scaling up by adding more resources to the original virtual machines (VMs) in cluster is not clear. In this paper, we study the scalability performance issues of hadoop virtual cluster with cost consideration. We first present the design and implementation of VirtualMR platform which can provide users with scalable hadoop virtual cluster services for the MapReduce based big data analytics. Then we run a series of hadoop benchmarks and real parallel machine learning algorithms to evaluate the scalability performance, including scale-up method and scale-out method. Finally, we integrate our platform with resource monitoring module and propose a system tuner. By analyzing the monitored data, we dynamically adjust the parameters of hadoop framework and virtual machine configuration to improve resource utilization and reduce rent cost. Experimental results show that the scale-up method outperforms the scale-out method for CPU-bound applications, and it is opposite for I/O-bound applications. The results also verify the efficiency of system tuner to increase resource utilization and reduce rent cost.
INDEX TERMS
Big data, Virtualization, Monitoring, Benchmark testing, Parallel processing, Scalability, Resource management
CITATION

Y. He, X. Jiang, Z. Wu, K. Ye and Z. Chen, "Scalability Analysis and Improvement of Hadoop Virtual Cluster with Cost Consideration," 2014 IEEE 7th International Conference on Cloud Computing (CLOUD), Anchorage, AK, USA, 2014, pp. 594-601.
doi:10.1109/CLOUD.2014.85
220 ms
(Ver 3.3 (11022016))