The Community for Technology Leaders
2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum (2011)
Anchorage, Alaska USA
May 16, 2011 to May 20, 2011
ISSN: 1530-2075
ISBN: 978-0-7695-4577-6
pp: 1122-1131
Cloud computing technologies play an increasingly important role in realizing data-intensive applications by offering a virtualized compute and storage infrastructure that can scale on demand. A programming model that has gained a lot of interest in this context is MapReduce, which simplifies processing of large-scale distributed data volumes, usually on top of a distributed file system layer. In this paper we report on a self-configuring adaptive framework for developing and optimizing data-intensive scientific applications on top of Cloud and Grid computing technologies and the Hadoop framework. Our framework relies on a MAPE-K loop, known from autonomic computing, for optimizing the configuration of data-intensive applications at three abstraction layers: the application layer, the MapReduce layer, and the resource layer. By evaluating monitored resources, the framework configures the layers and allocates the resources on a per job basis. The evaluation of configurations relies on historic data and a utility function that ranks different configurations regarding to the arising costs. The optimization framework has been integrated in the Vienna Grid Environment (VGE), a service-oriented application development environment for providing applications on HPC systems, clusters and Clouds as services. An experimental evaluation of our framework has been undertaken with a data-analysis application from the field of molecular systems biology.

Y. Kaniovskyi, S. Benkner and M. Koehler, "An Adaptive Framework for the Execution of Data-Intensive MapReduce Applications in the Cloud," 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum(IPDPSW), Anchorage, Alaska USA, 2011, pp. 1122-1131.
94 ms
(Ver 3.3 (11022016))