The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.03 - March (2008 vol.19)
pp: 378-393
ABSTRACT
Scientific workflows are a topic of great interest in the Grid community that sees in the workflow model an attractive paradigm for programming distributed wide-area Grid infrastructures. Traditionally, the Grid workflow execution is approached as a pure best-effort scheduling problem that maps the activities onto the Grid processors based on appropriate optimisation or local matchmaking heuristics such that the overall execution time is minimised. Even though such heuristics often deliver effective results, the execution in dynamic and unpredictable Grid environments is prone to severe performance losses that must be understood for minimising the completion time or for efficient use of high- erformance resources. In this paper, we propose a new systematic approach to help the scientists and middleware developers understand the most severe sources of performance losses that occur when executing scientific workflows in dynamic Grid environments. We introduce an ideal model for the lowest execution time that can be achieved by a workflow and explain the difference to the real measured Grid execution time based on a hierarchy of performance overheads for Grid computing. We describe how to systematically measure and compute the overheads from individual activities to larger workflow regions and adjust well-known parallel processing metrics to the scope of Grid computing, including speedup and efficiency. We present a distributed online tool for computing and analysing the performance overheads in realtime based on event correlation techniques and introduce several performance contracts, as quality of service parameters to be enforced during the workflow execution beyond traditional best-effort practices. We illustrate our method through post-mortem and online performance analysis of two real-world workflow applications executed in the Austrian Grid environment.
INDEX TERMS
Distributed systems, distributed applications, distributed/Internet based software engineering tools and techniques, performance measurements, monitors, performance evaluation, performance attributes
CITATION
Radu Prodan, Thomas Fahringer, "Overhead Analysis of Scientific Workflows in Grid Environments", IEEE Transactions on Parallel & Distributed Systems, vol.19, no. 3, pp. 378-393, March 2008, doi:10.1109/TPDS.2007.70734
REFERENCES
[1] F. Berman et al., “New Grid Scheduling and Rescheduling Methods in the GrADS Project,” Int'l J. Parallel Programming, vol. 33, nos. 2-3, pp. 209-229, 2005.
[2] E. Deelman et al., “Mapping Abstract Complex Workflows onto Grid Environments,” J. Grid Computing, vol. 1, no. 1, pp. 25-39, 2003.
[3] A. Mayer, S. McGough, N. Furmento, W. Lee, S. Newhouse, and J. Darlington, “ICENI Dataflow and Workflow: Composition and Scheduling in Space and Time,” Proc. UK e-Science All Hands Meeting, pp. 627-634, 2003.
[4] I. Taylor, M. Shields, I. Wang, and R. Rana, “Triana Applications within Grid Computing and Peer to Peer Environments,” J. Grid Computing, vol. 1, no. 2, pp. 199-217, 2003.
[5] T. Oinn, M. Addis, J. Ferris, D. Marvin, M. Senger, M. Greenwood, T. Carver, K. Glover, M. Pocock, A. Wipat, and P. Li, “Taverna: A Tool for the Composition and Enactment of Bioinformatics Workflows,” Bioinformatics, vol. 20, no. 17, pp. 3045-3054, 2004.
[6] D.W. Erwin, “UNICORE—a Grid Computing Environment,” Concurrency and Computation: Practice and Experience, vol. 14, nos.13-15, pp. 1395-1410, 2002.
[7] T. Fahringer et al., “Askalon: A Development and Grid Computing Environment for Scientific Workflows,” Workflows for e-Science: Scientific Workflows for Grids, I.J. Taylor, E.Deelman, D. B. Gannon, and M. Shields, eds., Springer, http:/www.askalon.org, 2007.
[8] A. Alves et al., Web Services Business Process Execution Language, Specification 2, Organization for the Advancement of Structured Information Standards, ftp://www6.software. ibm.com/software /developer/ libraryws-bpel11.pdf, Aug. 2006.
[9] R. Wolski, N.T. Spring, and J. Hayes, “The Network Weather Service: A Distributed Resource Performance Forecasting Service for Metacomputing,” Future Generation Computer Systems, vol. 15, nos. 5-6, pp. 757-768, 1999.
[10] K. Czajkowski et al., “Grid Information Services for Distributed Resource Sharing,” Proc. 10th IEEE Int'l Symp. High Performance Distributed Computing (HPDC), 2001.
[11] D. Nurmi, A. Mandal, J. Brevik, C. Koelbel, R. Wolski, and K. Kennedy, “Evaluation of a Workflow Scheduler Using Integrated Performance Modelling and Batch Queue Wait Time Prediction,” Proc. ACM/IEEE Supercomputing Conf. (SC), 2006.
[12] DAGMan: Directed Acyclic Graph Manager, Univ. of Wisconsin, Madison, http://www.cs.wisc.edu/condordagman/, Condor Project, 2007.
[13] R. Vaarandi, “SEC—A Lightweight Event Correlation Tool,” Proc. Workshop IP Operations and Management (IPOM), 2002.
[14] G. Liu, A.K. Mok, and E.J. Yang, “Composite Events for Network Event Correlation,” Proc. Sixth IFIP/IEEE Int'l Symp. Integrated Network Management (IM), 1999.
[15] The Austrian Grid Consortium, http:/www.austriangrid.at, 2007.
[16] PBS: The Portable Batch System, http:/www.openpbs.org, 2007.
[17] Sun Microsystems, Sun Grid Engine, http:/gridengine.sunsource. net/, 2007.
[18] K. Czajkowski et al., “A Resource Management Architecture for Metacomputing Systems,” Proc. Fourth Workshop Job Scheduling Strategies for Parallel Processing, pp. 62-82, 1998.
[19] I. Foster and C. Kesselman, “Globus: A Metacomputing Infrastructure Toolkit,” Int'l J. Supercomputer Applications and High Performance Computing, vol. 11, no. 2, pp. 115-128, 1997.
[20] J. Hofer, A. Villazón, M. Siddiqui, and T. Fahringer, “The Otho Toolkit: Generating Tailor-Made Scientific Grid Application Wrappers,” Proc. Second Int'l Conf. Grid Service Eng. and Management (GSEM '05), pp. 323-337, 2005.
[21] B. Allcock, J. Bester, J. Bresnahan, A.L. Chervenak, I. Foster, C. Kesselman, S. Meder, V. Nefedova, D. Quesnel, and S. Tuecke, “Data Management and Transfer in High-Performance Computational Grid Environments,” Parallel Computing, vol. 28, no. 5, pp.749-771, 2002.
[22] I. Foster, C. Kesselman, G. Tsudik, and S. Tuecke, “A Security Architecture for Computational Grids,” Proc. Fifth ACM Conf. Computer and Comm. Security (CCS '98), pp. 83-92, 1998.
[23] R. Duan, R. Prodan, and T. Fahringer, “Run-Time Optimisation for Grid Workflow Applications,” Proc. Seventh IEEE/ACM Int'l Conf. Grid Computing (Grid), 2006.
[24] K. Schwarz, P. Blaha, and G.K.H. Madsen, “Electronic Structure Calculations of Solids Using the Wien2k Package for Material Sciences,” Computer Physics Comm., vol. 147, no. 71, 2002.
[25] T. Fahringer, M. Gerndt, G. Riley, and J. Träff, “Knowledge Specification for Automatic Performance Analysis,” Revised Version, Workpackage 2: Identification and Formalization of Knowledge, technical report, EU IST APART, Research Centre Jülich, Zentralinstitut für Angewandte Math., http://www.fz-juelich.de apart/, 2001.
[26] T. Banks, Web Services Resource Framework (WSRF), Specification Primer v1.2, Organization for the Advancement of Structured Information Standards (OASIS), http://www.oasis-open.org/committeeswsrf /, 2006.
[27] H. Bandemer and S. Gottwald, Fuzzy Sets, Fuzzy Logic, Fuzzy Methods with Applications. John Wiley & Sons, 1995.
[28] S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci, “A Scalable Cross-Platform Infrastructure for Application Performance Tuning Using Hardware Counters,” Proc. ACM/IEEE Supercomputing Conf. (SC), 2000.
[29] K.R. Jackson, “pyGlobus: A Python Interface to the Globus Toolkit,” Concurrency and Computation: Practice and Experience, vol. 14, nos. 13-15, pp. 1075-1083, 2002.
[30] T. Plachetka, “POVRAY—Persistence of Vision Parallel Raytracer,” Proc. Computer Graphics Int'l Conf. (CGI '98), pp. 123-129, 1998.
[31] R. Buyya and S. Venugopal, “The Gridbus Toolkit for Service Oriented Grid and Utility Computing: An Overview and Status Report,” Proc. First Int'l Workshop Grid Economics and Business Models (GECON '04), pp. 19-36, 2004.
[32] B. Ludaescher, I. Altintas, C. Berkley, D. Higgins, E. Jaeger-Frank, M. Jones, E. Lee, J. Tao, and Y. Zhao, “Scientific Workflow Management and the Kepler System,” Concurrency and Computation: Practice and Experience, special issue on scientific workflows, 2005.
[33] J.S. Vetter and D.A. Reed, “Real-Time Performance Monitoring, Adaptive Control, and Interactive Steering of Computational Grids,” Int'l J. High Performance Computing Applications, vol. 14, no. 4, pp. 357-366, 2000.
[34] X. Zhang, J.L. Freschl, and J.M. Schopf, “A Performance Study of Monitoring and Information Services for Distributed Systems,” Proc. 12th IEEE Int'l Symp. High Performance Distributed Computing (HPDC), 2003.
[35] The Condor Project, Hawkeye, http://www.cs.wisc.edu/condorhawkeye/, 2007.
[36] A.W. Cooke et al., “R-GMA: An Information Integration System for Grid Monitoring,” Proc. 11th Int'l Conf. Cooperative Information Systems (CoopIS), 2003.
[37] G. Gombás, C.A. Marosi, and Z. Balaton, “Grid Application Monitoring and Debugging Using the Mercury Monitoring System,” Advances in Grid Computing—European Grid Conf. (EGC'05), pp. 193-199, 2005.
[38] D. Gunter, B. Tierney, B. Crowley, M. Holding, and J. Lee, “Netlogger: A Toolkit for Distributed System Performance Analysis,” Proc. Eighth Int'l Symp. Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS '00), pp. 267-273, 2000.
[39] M.L. Massie, B.N. Chun, and D.E. Culler, “The Ganglia Distributed Monitoring System: Design, Implementation, and Experience,” Parallel Computing, vol. 30, no. 7, pp. 817-840, 2004.
[40] M. Gupta and M. Subramanian, “Preprocessor Algorithm for Network Management Codebook,” Proc. Workshop Intrusion Detection and Network Monitoring, pp. 93-102, 1999.
18 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool