Research in workflow management systems design references the mobile agent computing paradigm where agents have been shown to increase the total capacity of a workflow system through the decoupling of execution management from a statically designated workflow engine, although coordinating fault tolerance mechanisms has been shown to be a downside due to increased overall execution times. To address this issue, we develop a model for comparing the effects of two fault tolerance techniques: local and remote checkpointing. The model enables an examination of fault tolerance coordination impacts on execution time while concomitantly taking into account the dynamic nature of a workflow environment. A proposed use for the model includes providing for selecting and configuring agent-based fault tolerance approaches based on changes in environmental variables - an approach that allows the owners of a workflow management system to reap the scaling efficiency benefits of the mobile agent paradigm without being forced to make trade-offs in execution performance.
Citation:
Jason Nichols, Haluk Demirkan, Michael Goul, "Towards a Model of Fault Tolerance Technique Selection in Static and Dynamic Agent-Based Inter-Organizational Workflow Management Systems," hicss, vol. 7, pp.188c, Proceedings of the 38th Annual Hawaii International Conference on System Sciences (HICSS'05) - Track 7, 2005