This Article 
 Bibliographic References 
 Add to: 
An Efficient Modular Spare Allocation Scheme and Its Application to Fault Tolerant Binary Hypercubes
January 1991 (vol. 2 no. 1)
pp. 117-126

Consideration is given to fault tolerant systems that are built from modules called fault tolerant basic blocks (FTBBs), where each module contains some primary nodes and some spare nodes. Full spare utilization is achieved when each spare within an FTBB can replace any other primary or spare node in that FTBB. This, however, may be prohibitively expensive for larger FTBBs. Therefore, it is shown that for a given hardware overhead more reliable systems can be designed using bigger FTBBs without full spare utilization than using smaller FTBBs with full spare utilization. Sufficient conditions for maximizing the reliability of a spare allocation strategy in an FTBB for a given hardware overhead are presented. The proposed spare allocation strategy is applied to two fault tolerant reconfiguration schemes for binary hypercubes. One scheme uses hardware switches to replace a faulty node, and the other scheme uses fault tolerant routing to bypass faulty nodes in the system and deliver messages to the destination node.

[1] P. Banerjeeet al., "An evaluation of system-level fault tolerance on the intel hypercube multiprocessor," inProc. 18th Int. Symp. Fault-Tolerant Comput., 1988, pp. 362-367.
[2] D. Blough and N. Bagherzadeh, " A new fault tolerant routing algorithm for hypercube systems," inProc. Int. Workshop Hardware Fault Tolerance, Urbana, IL, June 1989, pp. 52-54.
[3] S-C. Chau and A. L. Liestman, "A proposal for a fault-tolerant binary hypercube," inProc. Nineteenth Fault Tolerant Comput. Symp., Chicago, IL, June 1989, pp. 323-330.
[4] M. S. Chen and K. G. Shin, "Message routing in an injured hypercube," inProc. 3rd Conf. Hypercube Concurrent Comput. and Appl., vol. I, Jan. 1988, pp. 312-317.
[5] E. Chow, H. S. Madan, J. C. Peterson, D. Grunwald, and D. Reed, "Hyperswitch network for the hypercube computer," inProc. 15th Ann. Int. Symp. Comput. Architecture, 1988, pp. 90-99.
[6] S. Dutt and J. P. Hayes, "An automorphic approach to the design of fault-tolerant multiprocessors," inProc. Nineteenth Fault Tolerant Comput. Symp., June 1989, Chicago, IL, pp. 496-503.
[7] A. S. M. Hassan and V. K. Agarwal, "A fault tolerant modular architecture for binary trees,"IEEE Trans. Comput., vol. C-35, no. 4, pp. 356-361, Apr. 1986.
[8] M. Howells and V.K. Agrawal, "A reconfigurating scheme for yield enhancement of large area binary tree architectures,"IEEE Trans. Comput., vol. C-37, no. 4, pp. 463-468, Apr. 1988.
[9] D.A. Rennels, "On implementing fault-tolerance in binary hypercubes," inProc. IEEE Fault Tolerant Comput., 1985, pp. 344-349.
[10] H. Sulivan and T. R. Bashkov, "A large scale homogeneous, fully distributed parallel machine, I," inProc. 4th Symp. Comput. Arch., March 1977, pp. 105-117.
[11] K. S. Swarz and D. P. Siewiorek,The Theory and Practice of Reliable System Design.Bedford, MA: Digital, 1982.

Index Terms:
Index Termsmodular spare allocation scheme; fault tolerant binary hypercubes; fault tolerant basicblocks; primary nodes; spare nodes; hardware switches; fault tolerant computing;hypercube networks; multiprocessing systems
M.S. Alam, R.G. Melhem, "An Efficient Modular Spare Allocation Scheme and Its Application to Fault Tolerant Binary Hypercubes," IEEE Transactions on Parallel and Distributed Systems, vol. 2, no. 1, pp. 117-126, Jan. 1991, doi:10.1109/71.80194
Usage of this product signifies your acceptance of the Terms of Use.