The Community for Technology Leaders
Green Image
<p><b>Abstract</b>—Two general approaches have been proposed for deadlock handling in wormhole networks. Traditionally, deadlock-avoidance strategies have been used. In this case, either routing is restricted so that there are no cyclic dependencies between channels or cyclic dependencies between channels are allowed provided that there are some escape paths to avoid deadlock. More recently, deadlock recovery strategies have begun to gain acceptance. These strategies allow the use of unrestricted fully adaptive routing, usually outperforming deadlock avoidance techniques. However, they require a deadlock detection mechanism and a deadlock recovery mechanism that is able to recover from deadlocks faster than they occur. In particular, progressive deadlock recovery techniques are very attractive because they allocate a few dedicated resources to quickly deliver deadlocked messages, instead of killing them. Unfortunately, distributed deadlock detection is usually based on crude time-outs, which detect many false deadlocks. As a consequence, messages detected as deadlocked may saturate the bandwidth offered by recovery resources, thus degrading performance. Additionally, the threshold required by the detection mechanism (the time-out) strongly depends on network load, which is not known in advance at the design stage. This limits the applicability of deadlock recovery on actual networks. In this paper, we propose a novel distributed deadlock detection mechanism that uses only local information, detects all the deadlocks, considerably reduces the probability of false deadlock detection over previously proposed techniques, and is not significantly affected by variations in message length and/or message destination distribution.</p>
Wormhole switching, deadlock detection, deadlock recovery, true fully adaptive routing.

P. L?pez, J. Duato and J. Mart?nez Rubio, "FC3D: Flow Control-Based Distributed Deadlock Detection Mechanism for True Fully Adaptive Routing in Wormhole Networks," in IEEE Transactions on Parallel & Distributed Systems, vol. 14, no. , pp. 765-779, 2003.
86 ms
(Ver 3.3 (11022016))