The Community for Technology Leaders
2010 IEEE 26th International Conference on Data Engineering (ICDE 2010) (2010)
Long Beach, CA, USA
Mar. 1, 2010 to Mar. 6, 2010
ISBN: 978-1-4244-5445-7
pp: 657-668
Christopher Yang , CSAIL, MIT, 77 Massachusetts Ave, Cambridge, 02139, USA
Christine Yen , CSAIL, MIT, 77 Massachusetts Ave, Cambridge, 02139, USA
Ceryen Tan , CSAIL, MIT, 77 Massachusetts Ave, Cambridge, 02139, USA
Samuel R. Madden , CSAIL, MIT, 77 Massachusetts Ave, Cambridge, 02139, USA
ABSTRACT
In this paper, we describe a scheme for tolerating and recovering from mid-query faults in a distributed shared nothing database. Rather than aborting and restarting queries, our system, Osprey, divides running queries into subqueries, and replicates data such that each subquery can be rerun on a different node if the node initially responsible fails or returns too slowly. Our approach is inspired by the fault tolerance properties of MapReduce, in which map or reduce jobs are greedily assigned to workers, and failed jobs are rerun on other workers.
INDEX TERMS
CITATION

C. Yang, C. Tan, S. R. Madden and C. Yen, "Osprey: Implementing MapReduce-style fault tolerance in a shared-nothing distributed database," 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010)(ICDE), Long Beach, CA, USA, 2010, pp. 657-668.
doi:10.1109/ICDE.2010.5447913
87 ms
(Ver 3.3 (11022016))