2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN) (2018)
Luxembourg City, Luxembourg
Jun 25, 2018 to Jun 28, 2018
Modern large scale distributed systems increasingly espouse sophisticated distributed architectures characterized by complex distributed structural invariants. Unfortunately, maintaining these structural invariants at scale is time consuming and error prone, as developers must take into account asynchronous failures, loosely coordinated sub-systems and network delays. To address this problem, we propose PLEIADES, a new framework to construct and enforce large-scale distributed structural invariants under aggressive conditions. PLEIADES combines the resilience of self-organizing overlays, with the expressiveness of an assembly-based design strategy. The result is a highly survivable framework that is able to dynamically maintain arbitrary complex distributed structures under aggressive crash failures. Our evaluation shows in particular that PLEIADES is able to restore the overall structure of a 25,600 node system in less than 11 asynchronous rounds after half of the nodes have crashed.
distributed processing, fault tolerant computing
S. Bouget, Y. Bromberg, A. Luxey and F. Taiani, "Pleiades: Distributed Structural Invariants at Scale," 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Luxembourg City, Luxembourg, 2018, pp. 542-553.