Issue No. 08 - August (2002 vol. 51)
<p>This paper considers defect tolerance issues for parallel computing systems based on a new interconnection network, namely "Tori connected mESHes (TESH)." Key features of this network are the following: It is hierarchical, thus allowing exploitation of computation locality and systematic expansion up to a million processors, and it appears to be well-suited for VLSI/ULSI realization, including 3D implementation. The goal here is to present efficient reconfiguration algorithms for such hierarchical parallel computing systems. Despite the dramatic improvement in defect density in recent years, it is still necessary to provide redundancy and defect circumvention to achieve acceptable system-level yields for large multicomputer systems. The TESH-based parallel systems are no exception. Therefore, we develop placement and routing algorithms that assign logical nodes to healthy physical nodes and configure switches to bypass the defective cells, switches, and links. Simulations indicate that the placement (or remapping) is nearly 100 percent effective, while the routing performance diminishes with increasing defect density for a given extent of redundancy. The approach scales up well because, in TESH networks, essentially the same kind of sparing is used at all levels.</p>
Interconnection networks, hierarchical networks, TESH, parallel computing systems, VLSI, ULSI, manufacturing defects, fault-tolerance, redundancy, reconfiguration, routing, yield.
B. Maziarz and V. Jain, "Automatic Reconfiguration and Yield of the TESH Multicomputer Network," in IEEE Transactions on Computers, vol. 51, no. , pp. 963-972, 2002.