This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
A Fault-Tolerant Distributed Subcube Management Scheme for Hypercube Multicomputer Systems
July 1995 (vol. 6 no. 7)
pp. 766-772

Abstract—This paper proposes a fault-tolerant distributed subcube management scheme for hypercube multicomputer systems. Gracefully degradable subcube management is supported by a data structure, called the distributed subcube table (DST), and a fault-tolerant broadcast protocol, called the reliably synchronized broadcast (RSB). In an n-dimensional hypercube, DST is the collection of 2nlocal subcube tables (LSTs), ${\mbi DST = \{LST_0,\,LT_1,\,\dots,\,LST^n_{2-1}\}}$, where LSTx is a bit-mapped table assigned to Nx, a fault-free node whose address is x. LSTx, ∀x, is n+ 1 bits long, and it records the status (free/busy) of certain subcubes adjacent to Nx. The RSB diagnoses and avoids faults during interprocessor communication to prevent faulty nodes from being allocated for job execution. In addition to possessing a fault-tolerant design, our scheme can also achieve comparable or better performance than existing centralized schemes, as verified by extensive simulation.

[1] S. Dutt and J.P. Hayes, "Subcube Allocation in Hypercube Computers," IEEE Trans. Computers, vol. 40, no. 3, pp. 341-352, Mar. 1991.
[2] nCUBE Corp., nCUBE 2 Processor Manual, nCUBE Corporation, Dec. 1990.
[3] A. Al-Dhelaan and B. Bose,“A new strategy for processor allocation in an n-cube multiprocessor,” Proc. Int’l Phoenix Conf. Computers and Comm., pp. 114-118, Mar. 1989.
[4] M. S. Chen and K. G. Shin,“Processor allocation in an$N$-cube multiprocessor using gray codes,”IEEE Trans. Comput., vol. C-37, pp. 1396–1407, Dec. 1987.
[5] H. Wang and Q. Yang,“Prime cube graph approach for processor allocation in hypercube multiprocessors,” Proc. Int’l Conf. Parallel Processing, vol. 1, pp. 25-32, Aug. 1991.
[6] P.-J. Chuang and N.-J. Tzeng,“Dynamic processor allocation in hypercube computers,”inProc. 17th Annu. Int. Symp. Comput. Architect., May 1990, pp. 40–49.
[7] J. Kim, C.R. Das, and W. Lin, “A Top-Down Processor Allocation Scheme for Hypercube Computers,” IEEE Trans. Parallel and Distributed Systems, vol. 2, no. 1, pp. 20-30, Jan. 1991.
[8] P. Krueger, T.H. Lai, and V.A. Radiya, “Job Scheduling Is More Important than Processor Allocation for Hypercube Computers,” IEEE Trans. Parallel and Distributed Systems, vol. 5, no. 5, pp. 488-497, May 1994.
[9] P. Mohapatra,C. Yu,C.R. Das,, and J. Kim,“A lazy scheduling scheme for improving hypercube performance,” Proc. Int’l Conf. Parallel Processing, vol. I, pp. 110-117, 1993.
[10] B. Becker and H. U. Simon,“How robust is the n-cube,” Proc. IEEE 27th Symposium Foundations of Computer Science, pp. 283-291, Oct. 1986.
[11] S. Latifi, "Distributed Subcube Identification Algorithms for Reliable Hypercubes," Information Processing Letters, vol. 38, pp. 315-321, June 1991.
[12] H.L. Chen and N.F. Tzeng,“Quick determination of subcubes in a faulty hypercube,” Proc. Int’l Conf. Parallel Processing, pp. 338-345, July 1992.
[13] M. Peercy and P. Banerjee, "Design and Analysis of Software Reconfiguration Strategies for Hypercube Multicomputers Under Multiple Faults," Proc. 22nd Int'l Symp. Fault Tolerant Computing, pp. 448-455, June 1992.
[14] Intel, Paragon XP/S: Product Overview, Intel Corp., 1991.
[15] W.D. Hillis and L.W. Tucker, “The CM-5 Connection Machine: A Scalable Supercomputer,” Comm. ACM, vol. 36, pp. 31–40, Nov. 1993.
[16] Z. Kohavi,Switching and Finite Automata Theory, McGraw Hill, 1978.
[17] J.-C. Liu and Y.-L. Chen,“On the distributed subcube-allocation strategies in the hypercube multiprocessor systems,” Proc. Fourth IEEE Symp. Parallel and Distributed Processing, pp. 360-364, Dec. 1992.
[18] Y.-L. Chen and J.-C. Liu,“Reliable management of subcubes in hypercube multicomputer systems,” Technical Report, 94-041, Dept. of Computer Science, Texas A&M Univ., May 1994.

Index Terms:
Distributed subcube management, fault-tolerance, hypercube multicomputer, reliable broadcast.
Citation:
Yi-long Chen, Jyh-Charn Liu, "A Fault-Tolerant Distributed Subcube Management Scheme for Hypercube Multicomputer Systems," IEEE Transactions on Parallel and Distributed Systems, vol. 6, no. 7, pp. 766-772, July 1995, doi:10.1109/71.395406
Usage of this product signifies your acceptance of the Terms of Use.