Issue No. 06 - June (2007 vol. 18)
Guang Tan , IEEE
Stephen A. Jarvis , IEEE
<p><b>Abstract</b>—A key technical challenge for overlay multicast is that the highly dynamic multicast members can make data delivery unreliable. In this paper, we address this issue in the context of live media streaming by exploring 1) how to construct a stable multicast tree that minimizes the negative impact of frequent member departures on an existing overlay and 2) how to efficiently recover from packet errors caused by end-system or network failures. For the first problem, we identify two layout schemes for the tree nodes, namely, the <it>bandwidth-ordered</it> tree and the <it>time-ordered</it> tree, which represent two typical approaches to improving tree reliability, and conduct a stochastic analysis on their properties regarding reliability and tree depth. Based on the findings, we propose a distributed <it>Reliability-Oriented Switching Tree</it> (ROST) algorithm that minimizes the failure correlation among tree nodes. Compared with some commonly used distributed algorithms, the ROST algorithm significantly improves tree reliability and reduces average service delay, while incurring only a small protocol overhead; furthermore, it features a mechanism that prevents cheating or malicious behaviors in the exchange of bandwidth/time information. For the second problem, we develop a simple <it>Cooperative Error Recovery</it> (CER) protocol that helps recover from packet errors efficiently. Recognizing that a single recovery source is usually incapable of providing the timely delivery of the lost data, the protocol recovers from data outages using the residual bandwidths from multiple sources, which are identified using a minimum-loss-correlation algorithm. Extensive simulations demonstrate the effectiveness of the proposed schemes.</p>
Reliability, fault resilience, multicast, media streaming, peer-to-peer, overlay.
Guang Tan, Stephen A. Jarvis, "Improving the Fault Resilience of Overlay Multicast for Media Streaming", IEEE Transactions on Parallel & Distributed Systems, vol. 18, no. , pp. 721-734, June 2007, doi:10.1109/TPDS.2007.1054