E-Science 2010. 6th IEEE International Conference on E-Science (E-Science 2010) (2010)
Dec. 7, 2010 to Dec. 10, 2010
The MapReduce pattern popularized by Google has successfully been utilized in several scientific applications. In this paper, it is investigated whether a MapReduce approach utilizing on-demand resources from a Cloud is beneficial to perform simulation tasks in the area of Systems Biology and whether it can be seamlessly integrated into a service-oriented scientific workflow framework. In particular, an Amazon Elastic Map Reduce Cloud implementation of the 13C-MFA (Metabolix Flux Analysis) Monte Carlo bootstrap approach aimed at the integration into an existing BPEL-based scientific workflow system is presented. A comparison of a 64 node MapReduce cluster with a single node computation approach reveals a total performance gain up to a factor of 14, with a total cost for on-demand resources of $11. The most critical factor in terms of performance is I/O, i.e. our application suffers from the fact that I/O operations on many small files are expensive using Amazon S3 and the Hadoop DFS.
cloud computing, computer bootstrapping, data analysis, distributed processing, Monte Carlo methods, service-oriented architecture, workflow management software
T. Dalman et al., "Metabolic Flux Analysis in the Cloud," E-Science 2010. 6th IEEE International Conference on E-Science (E-Science 2010)(E-SCIENCE), Brisbane, QLD, 2011, pp. 57-64.