The Community for Technology Leaders
E-Science 2010. 6th IEEE International Conference on E-Science (E-Science 2010) (2010)
Brisbane, QLD
Dec. 7, 2010 to Dec. 10, 2010
ISBN: 978-1-4244-8957-2
pp: 57-64
The MapReduce pattern popularized by Google has successfully been utilized in several scientific applications. In this paper, it is investigated whether a MapReduce approach utilizing on-demand resources from a Cloud is beneficial to perform simulation tasks in the area of Systems Biology and whether it can be seamlessly integrated into a service-oriented scientific workflow framework. In particular, an Amazon Elastic Map Reduce Cloud implementation of the 13C-MFA (Metabolix Flux Analysis) Monte Carlo bootstrap approach aimed at the integration into an existing BPEL-based scientific workflow system is presented. A comparison of a 64 node MapReduce cluster with a single node computation approach reveals a total performance gain up to a factor of 14, with a total cost for on-demand resources of $11. The most critical factor in terms of performance is I/O, i.e. our application suffers from the fact that I/O operations on many small files are expensive using Amazon S3 and the Hadoop DFS.
cloud computing, computer bootstrapping, data analysis, distributed processing, Monte Carlo methods, service-oriented architecture, workflow management software

T. Dalman et al., "Metabolic Flux Analysis in the Cloud," E-Science 2010. 6th IEEE International Conference on E-Science (E-Science 2010)(E-SCIENCE), Brisbane, QLD, 2011, pp. 57-64.
95 ms
(Ver 3.3 (11022016))