Issue No. 03 - March (2008 vol. 19)
Thomas Fahringer , IEEE
Radu Prodan , IEEE
Scientific workflows are a topic of great interest in the Grid community that sees in the workflow model an attractive paradigm for programming distributed wide-area Grid infrastructures. Traditionally, the Grid workflow execution is approached as a pure best-effort scheduling problem that maps the activities onto the Grid processors based on appropriate optimisation or local matchmaking heuristics such that the overall execution time is minimised. Even though such heuristics often deliver effective results, the execution in dynamic and unpredictable Grid environments is prone to severe performance losses that must be understood for minimising the completion time or for efficient use of high- erformance resources. In this paper, we propose a new systematic approach to help the scientists and middleware developers understand the most severe sources of performance losses that occur when executing scientific workflows in dynamic Grid environments. We introduce an ideal model for the lowest execution time that can be achieved by a workflow and explain the difference to the real measured Grid execution time based on a hierarchy of performance overheads for Grid computing. We describe how to systematically measure and compute the overheads from individual activities to larger workflow regions and adjust well-known parallel processing metrics to the scope of Grid computing, including speedup and efficiency. We present a distributed online tool for computing and analysing the performance overheads in realtime based on event correlation techniques and introduce several performance contracts, as quality of service parameters to be enforced during the workflow execution beyond traditional best-effort practices. We illustrate our method through post-mortem and online performance analysis of two real-world workflow applications executed in the Austrian Grid environment.
Distributed systems, distributed applications, distributed/Internet based software engineering tools and techniques, performance measurements, monitors, performance evaluation, performance attributes
Thomas Fahringer, Radu Prodan, "Overhead Analysis of Scientific Workflows in Grid Environments", IEEE Transactions on Parallel & Distributed Systems, vol. 19, no. , pp. 378-393, March 2008, doi:10.1109/TPDS.2007.70734