The Community for Technology Leaders
Cluster Computing and the Grid, IEEE International Symposium on (2008)
May 19, 2008 to May 22, 2008
ISBN: 978-0-7695-3156-4
pp: 467-474
The Grid is inherently unreliable due to its geographical dispersion, heterogeneity and the involvement of multiple administrative domains. The most general case of failures are so-called Byzantine failures where no assumptions about the behavior of faulty components can be made. In this paper a novel system is described that allows to diagnose and tolerate byzantine faults based on service replication. We suggest, briefly describe and compare two fail-stop and two byzantine fault tolerance algorithms. Given that many scientific larger-scale Grid applications have complex outputs the comparison of replica results as needed to implement byzantine fault tolerance becomes a non-trivial task. Therefore we include an automation mechanism based on a generic description language and code generation for this particualar problem. Our approach has been implemented as extension to the Otho Toolkit, a system that synthesizes tailor-made wrapper services for a given application, Grid environment and resource. An analysis of performance and overheads for three real-world applications completes our work.
Grid, HPC, Fault Tolerance, Byzantine Fault Tolerance

J. Hofer and T. Fahringer, "Synthesizing Byzantine Fault-Tolerant Grid Application Wrapper Services," 2008 8th International Symposium on Cluster Computing and the Grid (CCGRID '08)(CCGRID), Lyon, 2008, pp. 467-474.
96 ms
(Ver 3.3 (11022016))