2015 IEEE 11th International Conference on e-Science (e-Science) (2015)
Aug. 31, 2015 to Sept. 4, 2015
Workflows have become a popular means for implementing experiments in computational sciences. They are beneficial over other forms of implementation, as they require a formalisation of the experiment process, they provide a standard set of functions to be used, and provide an abstraction of the underlying system. Thus, they facilitate understandability and repeatability of experimental research. Also, additional meta data standards such as Research Objects, which allow to add more meta-data about the research process, shall enable better reproducibility of experiments. However, as several studies have shown, merely implementing an experiment as a workflow in a workflow engine is not sufficient to achieve these goals, as still a number of challenges and pitfalls prevail. In this paper, we want to quantify how many workflow executions are easy to repeat. To this end, we automatically obtain and analyse a set of almost 1,500 workflows available in the myExperiment platform, focusing on the ones authored in the Taverna workflow language. We provide statistics on the types of processing steps used, and investigate what vulnerabilities in regards to re-execution are faced. We then try to automatically execute the workflows. Form these results, we conclude which are the most common causes for failures, and analyse how these can be countered, with existing or yet to be developed approaches.
Program processors, Engines, Java, Web services, Ports (Computers), Libraries
R. Mayer and A. Rauber, "A Quantitative Study on the Re-executability of Publicly Shared Scientific Workflows," 2015 IEEE 11th International Conference on e-Science (e-Science)(E-SCIENCE), Munich, Germany, 2015, pp. 312-321.