2017 IEEE 23rd International Conference on Parallel and Distributed Systems (ICPADS) (2017)
Shenzhen, Guangdong, China
Dec 15, 2017 to Dec 17, 2017
Social media networks as well as online graph analytics operate on large-scale graphs with millions of vertices, even billions in some cases. Low-latency access is essential, but caching suffers from the mostly irregular access patterns of the aforementioned application domains. Hence, distributed in-memory systems are proposed keeping all data always in memory. These in-memory systems are typically not optimized for the sheer amounts of small data objects, which demands new concepts regarding the local and global data management as well as for the fault-tolerance mechanisms to mask server failures and power outages. In this paper, we propose a novel backup distribution and parallel recovery approach aiming at fast recovery of servers storing hundreds of millions of small objects. All proposed concepts have been implemented within the open source distributed system DXRAM and have been evaluated in the Microsoft Azure cloud with up to 72 high performance virtual machines in two scale-sets. For evaluation, we used two benchmarks: the Yahoo! Cloud Serving Benchmark and a recovery benchmark. The experiments show that the proposed recovery strategy is able to recover servers with 500,000,000 small data objects in less than 2 seconds and, also, to efficiently mask server failures under heavy load. Furthermore, DXRAM outperforms the state-of-the-art system RAMCloud in additional recovery experiments with large objects (2.4× faster) and even more with small objects (> 9×).
cloud computing, fault tolerant computing, graph theory, storage management, virtual machines
K. Beineke, S. Nothaas and M. Schoettner, "Fast Parallel Recovery of Many Small In-Memory Objects," 2017 IEEE 23rd International Conference on Parallel and Distributed Systems (ICPADS), Shenzhen, Guangdong, China, 2018, pp. 248-257.