2014 IEEE 33rd International Symposium on Reliable Distributed Systems (SRDS) (2014)
Oct. 6, 2014 to Oct. 9, 2014
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/SRDS.2014.41
Cloud computing infrastructures leverage fault-tolerant and geographically distributed services in order to meet the requirements of modern applications. Each service deals with a large number of clients that compete for the resources it offers. When the load increases, the service needs to scale. In this paper, we investigate a scalability solution which consists in partitioning the service state. We formulate specific conditions under which a service is partitionable. Then, we present a general algorithm to build a dependable and consistent partitioned service. To assess the practicability of our approach, we implement and evaluate the ZooFence coordination service. ZooFence orchestrates several instances of ZooKeeper and presents the exact same API and semantics to its clients. It automatically splits the coordination service state among ZooKeeper instances while being transparent to the application. By reducing the convoy effect on operations and leveraging the workload locality, our approach allows proposing a coordination service with a greater scalability than with a single ZooKeeper instance. The evaluation of ZooFence assesses this claim for two benchmarks, a synthetic service of concurrent queues and the BookKeeper distributed logging engine.
History, Partitioning algorithms, Semantics, Nominations and elections, Parallel processing, Synchronization, Banking
R. Halalai, P. Sutra, E. Riviere and P. Felber, "ZooFence: Principled Service Partitioning and Application to the ZooKeeper Coordination Service," 2014 IEEE 33rd International Symposium on Reliable Distributed Systems (SRDS), Nara, Japan, 2014, pp. 67-78.