2013 IEEE 5th International Conference on Cloud Computing Technology and Science (2010)
Indianapolis, Indiana USA
Nov. 30, 2010 to Dec. 3, 2010
The utility computing model introduced by cloud computing combined with the rich set of cloud infrastructure services offers a very viable alternative to traditional servers and computing clusters. MapReduce distributed data processing architecture has become the weapon of choice for data-intensive analyses in the clouds and in commodity clusters due to its excellent fault tolerance features, scalability and the ease of use. Currently, there are several options for using MapReduce in cloud environments, such as using MapReduce as a service, setting up one’s own MapReduce cluster on cloud instances, or using specialized cloud MapReduce runtimes that take advantage of cloud infrastructure services. In this paper, we introduce Azure MapReduce, a novel MapReduce runtime built using the Microsoft Azure cloud infrastructure services. Azure MapReduce architecture successfully leverages the high latency, eventually consistent, yet highly scalable Azure infrastructure services to provide an efficient, on demand alternative to traditional MapReduce clusters. Further we evaluate the use and performance of MapReduce frameworks, including Azure MapReduce, in cloud environments for scientific applications using sequence assembly and sequence alignment as use cases.
MapReduce, Cloud Computing, AzureMapReduce, Elastic MapReduce, Hadoop
Judy Qiu, Geoffrey Fox, Tak-Lon Wu, Thilina Gunarathne, "MapReduce in the Clouds for Science", 2013 IEEE 5th International Conference on Cloud Computing Technology and Science, vol. 00, no. , pp. 565-572, 2010, doi:10.1109/CloudCom.2010.107