2017 IEEE International Conference on Cluster Computing (CLUSTER) (2017)
Honolulu, Hawaii, United States
Sept. 5, 2017 to Sept. 8, 2017
Social media networks as well as online graph analytics operate on large-scale graphs with millions of vertices, even billions in some cases. Low-latency access is essential, but caching suffers from the mostly irregular access patterns of the aforementioned application domains. Hence, distributed in-memory systems are proposed keeping all data always in memory. But, the sheer amount of small data objects demands new concepts regarding the local and global data management as well as for the fault-tolerance mechanisms to mask server failures and power outages. We propose a backup distribution mechanism and a parallel recovery concept allowing to recover a failed server storing hundreds of millions of small objects within 1 to 2 seconds. All proposed concepts have been implemented within the open source system DXRAM and have been evaluated in the Microsoft Azure cloud with up to 72 virtual machines. The experiments show that DXRAM can recover a server storing 500,000,000 small objects from SSDs within 2 seconds.
Servers, Random access memory, Benchmark testing, Throughput, Cloud computing, Distributed databases, Memory management
K. Beineke, S. Nothaas and M. Schoettner, "Parallelized Recovery of Hundreds of Millions Small Data Objects," 2017 IEEE International Conference on Cluster Computing (CLUSTER), Honolulu, Hawaii, United States, 2017, pp. 621-622.