This Article 
 Bibliographic References 
 Add to: 
Load Sharing in Hypercube-Connected Multicomputers in the Presence of Node Failures
October 1996 (vol. 45 no. 10)
pp. 1203-1211

Abstract—This paper addresses two important issues associated with load sharing (LS) in hypercube-connected multicomputers: 1) ordering fault-free nodes as preferred receivers of "overflow" tasks for each overloaded node and 2) developing an LS mechanism to handle node failures. Nodes are arranged into preferred lists of receivers of overflow tasks in such a way that each node will be selected as the kth preferred node of one and only one other node [1]. Such lists are proven to allow the overflow tasks to be evenly distributed throughout the entire system. However, the occurrence of node failures will destroy the original structure of a preferred list if the failed nodes are simply dropped from the list, thus forcing some nodes to be selected as the kth preferred node of more than one other node. We propose three algorithms to modify the preferred list such that its original features can be retained regardless of the number of faulty nodes in the system. It is shown that the number of adjustments or the communication overhead of these algorithms is minimal. Using the modified preferred lists, we also proposed a simple mechanism to tolerate node failures. Each node is equipped with a backup queue which stores and updates the information on the tasks arriving/completing at its most preferred node.

[1] K.G. Shin and Y.C. Chang, “A Coordinated Location Policy for Local Sharing in Hypercube-Connected Machines,” Trans. Computers, vol. 44, no. 5, May 1995.
[2] Y.-T. Wang and R.J.T. Morris, "Load Sharing in Distributed Systems," IEEE Trans. Computers, vol. 34, no. 3, pp. 204-217, Mar. 1985.
[3] D.L. Eager, E.D. Lazowska, and J. Zahorjan, "Adaptive Load Sharing in Homogeneous Distributed Systems," IEEE Trans. Software Eng., vol. 12, no. 5, pp. 662-675, May 1986.
[4] L.M. Ni, C. Xu, and T.B. Gendreau, “A Distributed Drafting Algorithm for Load Balancing,” IEEE Trans. Software Eng., vol. 11, no. 10, pp. 1153-1161, Oct. 1985.
[5] T.L. Casavant, "Analysis of Three Dynamic Distributed Load-Balancing Strategies with Varying Global Information Requirements," Proc. Seventh IEEE Int'l Conf. Distributed Computing Systems, pp. 185-192, 1987.
[6] K.G. Shin and Y.-C. Chang, "Load Sharing in Distributed Real-Time Systems with State Change Broadcasts," IEEE Trans. Computers, vol. 38, no. 8, pp. 1,1224-1,142, Aug. 1989.

Index Terms:
Load sharing, hypercube-connected multicomputers, real-time systems, node failures, backup queues.
Yi-Chieh Chang, Kang G. Shin, "Load Sharing in Hypercube-Connected Multicomputers in the Presence of Node Failures," IEEE Transactions on Computers, vol. 45, no. 10, pp. 1203-1211, Oct. 1996, doi:10.1109/12.543714
Usage of this product signifies your acceptance of the Terms of Use.