We consider the problem of recovering from failures of distributable threads in distributed real-time systems that operate under run-time uncertainties including those on thread execution times, thread arrivals, and node failure occurrences. When a thread encounters a node failure, it causes orphans. Under a termination model, the orphans must be detected and aborted, and exceptions must be delivered to farthest, contiguous surviving thread segment for resuming thread execution. Our application/scheduling model includes distributable threads and their exception handlers that are subject to time/utility function (TUF) time constraints and an utility accrual (UA) optimality criterion. A key underpinning of the TUF/UA scheduling paradigm is the notion of "best-effort" where high importance threads are always favored over low importance ones, irrespective of thread urgency. We present a scheduling algorithm called HUA and a thread integrity protocol called TPR. We show that HUA and TPR bound the orphan cleanup and recovery time with bounded loss of the best-effort property. Our implementation experience of HUA/TPR within Sun?s Distributed Real-Time Specification for Java demonstrates the algorithm/protocol?s effectiveness.
Citation:
Binoy Ravindran, Edward Curley, Jonathan S. Anderson, E. Douglas Jensen, "On Best-Effort Real-Time Assurances for Recovering from Distributable Thread Failures in Distributed Real-Time Systems," isorc, pp.344-353, 10th IEEE International Symposium on Object and Component-Oriented Real-Time Distributed Computing (ISORC'07), 2007