25th IEEE Symposium on Reliable Distributed Systems (SRDS'06) Leeds, United Kingdom October 02-October 04 ISBN: 0-7695-2677-2
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/SRDS.2006.38
We consider the problem of recovering from failures of distributable threads with assured timeliness. When a node hosting a portion of a distributable thread fails, it causes orphans-i.e., thread segments that are disconnected from the thread?s root. We consider a termination model for recovering from such failures, where the orphans must be detected and aborted, and failure-exception notification must be delivered to the farthest, contiguous surviving thread segment for resuming thread execution. We present a realtime scheduling algorithm called AUA, and a distributable thread integrity protocol called TP-TR. We show that AUA and TP-TR bound the orphan cleanup and recovery time, thereby bounding thread starvation durations, and maximize the total thread accrued timeliness utility. We implement AUA and TP-TR in a real-time middleware that supports distributable threads. Our experimental studies with the implementation validate the algorithm/protocol?s timebounded recovery property and confirm their effectiveness.
Citation:
Edward Curley, Jonathan Anderson, Binoy Ravindran, E. D. Jensen, "Recovering from Distributable Thread Failures with Assured Timeliness in Real-Time Distributed Systems," srds, pp.267-276, 25th IEEE Symposium on Reliable Distributed Systems (SRDS'06), 2006 Usage of this product signifies your acceptance of the Terms of Use. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||