2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum (2011)
Anchorage, Alaska USA
May 16, 2011 to May 20, 2011
Cloud computing technologies play an increasingly important role in realizing data-intensive applications by offering a virtualized compute and storage infrastructure that can scale on demand. A programming model that has gained a lot of interest in this context is MapReduce, which simplifies processing of large-scale distributed data volumes, usually on top of a distributed file system layer. In this paper we report on a self-configuring adaptive framework for developing and optimizing data-intensive scientific applications on top of Cloud and Grid computing technologies and the Hadoop framework. Our framework relies on a MAPE-K loop, known from autonomic computing, for optimizing the configuration of data-intensive applications at three abstraction layers: the application layer, the MapReduce layer, and the resource layer. By evaluating monitored resources, the framework configures the layers and allocates the resources on a per job basis. The evaluation of configurations relies on historic data and a utility function that ranks different configurations regarding to the arising costs. The optimization framework has been integrated in the Vienna Grid Environment (VGE), a service-oriented application development environment for providing applications on HPC systems, clusters and Clouds as services. An experimental evaluation of our framework has been undertaken with a data-analysis application from the field of molecular systems biology.
Y. Kaniovskyi, S. Benkner and M. Koehler, "An Adaptive Framework for the Execution of Data-Intensive MapReduce Applications in the Cloud," 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum(IPDPSW), Anchorage, Alaska USA, 2011, pp. 1122-1131.