The Community for Technology Leaders
Parallel and Distributed Processing Symposium, International (2009)
Rome, Italy
May 23, 2009 to May 29, 2009
ISBN: 978-1-4244-3751-1
pp: 1-12
A. L. Rosenberg , Colorado State University, USA
A. Benoit , ENS Lyon, France
Y. Robert , ENS Lyon, France
F. Vivien , INRIA, France
ABSTRACT
One has a large workload that is “divisible”—its constituent work's granularity can be adjusted arbitrarily;—and one has access to p remote computers that can assist in computing the workload. The problem is that the remote computers are subject to interruptions of known likelihood that kill all work in progress. One wishes to orchestrate sharing the workload with the remote computers in a way that maximizes the expected amount of work completed. Strategies for achieving this goal, by balancing the desire to checkpoint often, in order to decrease the amount of vulnerable work at any point, vs. the desire to avoid the context-switching required to checkpoint, are studied. Strategies are devised that provably maximize the expected amount of work when there is only one remote computer (the case p = 1). Results suggest the intractability of such maximization for higher values of p, which motivates the development of heuristic approaches. Heuristics are developed that replicate works on several remote computers, in the hope of thereby decreasing the impact of work-killing interruptions. The quality of these heuristics is assessed through exhaustive simulations
INDEX TERMS
CITATION
A. L. Rosenberg, A. Benoit, Y. Robert, F. Vivien, "Static strategies forworksharing with unrecoverable interruptions", Parallel and Distributed Processing Symposium, International, vol. 00, no. , pp. 1-12, 2009, doi:10.1109/IPDPS.2009.5161044
93 ms
(Ver 3.3 (11022016))