This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Enhanced Cluster k-Ary n-Cube, A Fault-Tolerant Multiprocessor
November 2003 (vol. 52 no. 11)
pp. 1443-1453
F? ?zg?, IEEE

Abstract—In this paper, we present a strongly fault-tolerant design for the k-ary n-cube multiprocessor and examine its reconfigurability. Our design augments the k-ary n-cube with {({\frac{k}{j}})}^n spare nodes. Each set of j^n regular nodes is connected to a spare node and the spare nodes are interconnected as either a ({\frac{k}{j}}){\hbox{-}}{\rm ary}n-cube if j \ne {\frac{k}{2}} or a hypercube of dimension n if j = {\frac{k}{2}}. Our approach utilizes the capabilities of the wave-switching communication modules of the spare nodes to tolerate a large number of faulty nodes. Both theoretical and experimental results are examined. Compared with other proposed schemes, our approach can tolerate significantly more faulty nodes with a low overhead and no performance degradation.

[1] J. Brandenburg, Technology Advances in the Intel Paragon System Proc. Ann. ACM Symp. Parallel Algorithms and Architectures, pp. 182-182, June 1993.
[2] M.D. Noakes, D.A. Wallach, and W.J. Dally, "The J-Machine Multicomputer: An Architectural Evaluation," Proc. 20th Ann. Int'l Symp. Computer Architecture, pp. 224-235, May 1993.
[3] S. Scott and G. Thorson, The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus Proc. HOT Interconnects IV, Aug. 1996.
[4] N. Tzeng, A Cube-Connected Cycles Architecture with High Reliability and Improved Performance IEEE Trans. Computers, vol. 42, no. 2, pp. 246-253, 1993.
[5] N. Mahapatra and S. Dutt, Hardware-Efficient and Highly-Reconfigurable 4- and 2-Track Fault-Tolerant Design for Mesh-Connected Multicomputers Proc. 26th Int'l Symp. Fault-Tolerant Computing, pp. 272-281, 1996.
[6] M.M. Bae and B. Bose, "Spare Processor Allocation for the Fault-Tolerance in Torus-Based Multicomputers," Proc. 26th Ann. Int'l Symp. Fault-Tolerant Computing, pp. 282-291, June 1996.
[7] A.D. Singh, Interstitial Redundancy: An Area Efficient Fault Tolerance Scheme for Large Area VLSI Processor Arrays IEEE Trans. Computers, vol. 37, no. 11, pp. 1398-1410, Nov. 1988.
[8] S. Chakravarty and S.J. Upadhyaya, A Unified Approach to Designing Fault-Tolerant Processor Ensembles Proc. IEEE Int'l Conf. Parallel Processing, pp. 339-342, 1988.
[9] M. Alam and R. Melhem, Routing in Modular Fault-Tolerant Multiprocessor Systems IEEE Trans. Parallel and Distributed Systems, vol. 6, no. 11, pp. 1206-1220, Nov. 1995.
[10] P. Banerjee and M. Peercy, Design and Evaluation of Hardware Strategies for Reconfiguring Hypercubes and Meshes under Faults IEEE Trans. Computers, vol. 43, no. 7, pp. 841-848, July 1994.
[11] P. Banerjee, "Strategies for Reconfiguring Hypercubes Under Faults," Proc. 20th Int'l Symp. Fault Tolerant Computing, pp. 210-217, June 1990.
[12] P. Banarjee, J.T. Rahmeh, C. Stunkel, V.S. Nair, K. Roy, V. Balasubramanian, and J.A. Abraham, “Algorithm-Based Fault Tolerance on a Hypercube Multiprocessor,” IEEE Trans. Computers, vol. 39, no. 9, pp. 1132-1145, Sept. 1990.
[13] M. Alam and R. Melhem, An Efficient Modular Spare Allocation Scheme and Its Application to Fault Tolerant Binary Hypercubes IEEE Trans. Parallel and Distributed Systems, vol. 2, no. 1, pp. 117-126, Jan. 1991.
[14] M. Alam and R. Melhem, Channel Multiplexing in Modular Fault Tolerant Multiprocessors Proc. IEEE Int'l Conf. Parallel Processing, pp. I492-I496, 1991.
[15] J. Bruck, R. Cypher, and C.T. Ho, Efficient Fault-Tolerant Mesh and Hypercube Architectures Proc. 22nd Ann. Int'l Symp. Fault-Tolerant Computing, pp. 162-169, July 1992.
[16] S. Dutt, "Fast Polylog-Time Reconfiguration of Structurally Fault-Tolerant Multiprocessors," Proc. Fifth IEEE Symp. Parallel and Distributed Processing, pp. 762-770, Dec. 1993.
[17] N. Tsuda, Fault-Tolerant Processor Arrays Using Additional Bypass Linking Allocated by Graph-Node Coloring IEEE Trans. Computers, vol. 49, no. 5, pp. 431-442, May 2000.
[18] H.K. Ku and J.P. Hayes, “Systematic Design of Fault-Tolerant Multiprocessors with Shared Buses,” IEEE Trans. Computers, vol. 46, no. 4, pp. 439-455, 1997.
[19] J. Bruck, R. Cypher, and C.-T. Ho, "Fault-Tolerant Meshes with Small Degree," Proc. ACM Symp. Parallel Algorithms and Architectures, pp. 1-10, 1993.
[20] J. Bruck, R. Cypher, and C.-T. Ho, "Wildcard Dimensions, Coding Theory and Fault-Tolerant Meshes and Hypercubes," Proc. 23rd Int'l Symp. Fault-Tolerant Computing, pp. 260-267, June 1993.
[21] M. Chean and J.A.B. Fortes, A Taxonomy of Reconfiguration Techniques for Fault-Tolerant Processor Arrays Computer, pp. 55-69, Jan. 1990.
[22] R. Negrini, M. Sami, and R. Stefanelli, Fault Tolerance Techniques for Array Structures Used in Supercomputing Computer, pp. 78-87, Feb. 1986.
[23] T. Horita and I. Takanami, Fault-Tolerant Processor Arrays Based on 1(1/2)-Track Switches with Flexible Spare Distribution IEEE Trans. Computers, vol. 49, no. 6, pp. 542-552, June 2000.
[24] N. Tsuda and T. Shimizu, Reconfigurable Mesh-Connected Processor Arrays Using Row-Column Bypassing and Direct Replacement Proc. 2000 Int'l Symp. Parallel Architectures, Algorithms and Networks, pp. 24-29, Dec. 2000.
[25] J. Duato, P. Lopez, and S. Yalamanchili, Deadlock- and Livelock-Free Routing Protocols for Wave Switching Proc. 11th Int'l Parallel Processing Symp., pp. 570-577, Apr. 1997.
[26] J. Dongarra and D. Walker, The Quest for Petascale Computing IEEE Computing in Science and Eng., pp. 32-39, May 2001.
[27] F. Harary, Graph Theory. Addison-Wesley, 1972.
[28] B. Izadi, Design of Fault-Tolerant Distributed Memory Multiprocessors PhD thesis, Ohio State Univ., 1995.
[29] C.Y. Lee, An Algorithm for Path Connection and Its Applications IRE Trans. Electronic Computers, vol. 10, pp. 346-365, 1961.
[30] F.T. Leighton,Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes.San Mateo, Calif.: Morgan Kaufmann, 1992.

Index Terms:
Fault tolerance, k-ary n-cube, hypercube, spare allocation, reconfiguration, augmented multiprocessor, wave switching.
Citation:
Baback A. Izadi, F? ?zg?, "Enhanced Cluster k-Ary n-Cube, A Fault-Tolerant Multiprocessor," IEEE Transactions on Computers, vol. 52, no. 11, pp. 1443-1453, Nov. 2003, doi:10.1109/TC.2003.1244942
Usage of this product signifies your acceptance of the Terms of Use.