Enhancing throughput of partially replicated state machines via multi-partition operation scheduling
2017 IEEE 16th International Symposium on Network Computing and Applications (NCA) (2017)
Cambridge, MA, USA
Oct. 30, 2017 to Nov. 1, 2017
Zhongmiao Li , Université catholique de Louvain
Peter Van Roy , Université catholique de Louvain
Paolo Romano , Instituto Superior Técnico, Lisboa & INESC-ID
State-machine replication (SMR) is a fundamental technique to implement fault-tolerant services. Recently, various works have aimed at enhancing the scalability of SMR by exploiting partial replication techniques. By sharding the state machine across disjoint partitions, and replicating each partition over independent groups of processes, a Partially Replicated State Machine (PRSM) can process operations that involve a single partition by only requiring synchronization among the replicas of that partition — achieving higher scalability than SMR. Unfortunately, though, existing PRSM rely on inefficient mechanisms to coordinate the execution of multi-partition operations, which either impose global coordination across all nodes in the system or require inter-partition synchronization on the critical path of execution of operations. As such, performance and scalability of existing PRSM systems is severely hindered in the presence of even a small fraction of multi-partition operations. This paper tackles this issue by presenting Genepi, a PRSM protocol that introduces a novel, highly efficient mechanism for regulating the execution of multi-partition operations. We show via an experimental evaluation based on both synthetic benchmarks and TPC-C that Genepi can achieve up to 5.5× of throughput gain over existing PRSM systems, with only negligible latency overhead at low load.
Synchronization, Protocols, Throughput, Scalability, Fault tolerance, Fault tolerant systems
Z. Li, P. Van Roy and P. Romano, "Enhancing throughput of partially replicated state machines via multi-partition operation scheduling," 2017 IEEE 16th International Symposium on Network Computing and Applications (NCA), Cambridge, MA, USA, 2017, pp. 1-10.