This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Fault Tolerance in Distributed Systems Using Fused Data Structures
April 2013 (vol. 24 no. 4)
pp. 701-715
B. Balasubramanian, Dept. of Electr. Eng., Princeton Univ., Princeton, NJ, USA
V. K. Garg, Electr. & Comput. Eng. Dept., Univ. of Texas at Austin, Austin, TX, USA
Replication is the prevalent solution to tolerate faults in large data structures hosted on distributed servers. To tolerate f crash faults (dead/unresponsive data structures) among n distinct data structures, replication requires f + 1 replicas of each data structure, resulting in nf additional backups. We present a solution, referred to as fusion that uses a combination of erasure codes and selective replication to tolerate f crash faults using just f additional fused backups. We show that our solution achieves O(n) savings in space over replication. Further, we present a solution to tolerate f Byzantine faults (malicious data structures), that requires only nf + f backups as compared to the 2nf backups required by replication. We explore the theory of fused backups and provide a library of such backups for all the data structures in the Java Collection Framework. The theoretical and experimental evaluation confirms that the fused backups are space-efficient as compared to replication, while they cause very little overhead for normal operation. To illustrate the practical usefulness of fusion, we use fused backups for reliability in Amazon's highly available key-value store, Dynamo. While the current replication-based solution uses 300 backup structures, we present a solution that only requires 120 backup structures. This results in savings in space as well as other resources such as power.
Index Terms:
replicated databases,client-server systems,computational complexity,data structures,fault tolerant computing,Java,computational complexity,distributed systems,fused data structure backup library,distributed servers,crash fault tolerance,dead data structure replication,unresponsive data structure replication,erasure codes,Byzantine faults,malicious data structures,Java collection framework,Amazon,Dynamo,Computer crashes,Servers,Indexes,Fault tolerance,Fault tolerant systems,Arrays,data structures,Distributed systems,fault tolerance
Citation:
B. Balasubramanian, V. K. Garg, "Fault Tolerance in Distributed Systems Using Fused Data Structures," IEEE Transactions on Parallel and Distributed Systems, vol. 24, no. 4, pp. 701-715, April 2013, doi:10.1109/TPDS.2012.96
Usage of this product signifies your acceptance of the Terms of Use.