This Article 
 Bibliographic References 
 Add to: 
Replicated Process Allocation for Load Distribution in Fault-Tolerant Multicomputers
April 1997 (vol. 46 no. 4)
pp. 499-505

Abstract—In this paper, we consider a load-balancing process allocation method for fault-tolerant multicomputer systems that balances the load before as well as after faults start to degrade the performance of the system. In order to be able to tolerate a single fault, each process (primary process) is duplicated (i.e., has a backup process). The backup process executes on a different processor from the primary, checkpointing the primary process and recovering the process if the primary process fails. In this paper, we formalize the problem of load-balancing process allocation and propose a new process allocation method and analyze the performance of the proposed method. Simulations are used to compare the proposed method with a process allocation method that does not take into account the different load characteristics of the primary and backup processes. While both methods perform well before the occurrence of a fault, only the proposed method maintains a balanced load after the occurrence of such a fault.

[1] L.J.M. Nieuwenhuis, "Static Allocation of Process Replicas in Fault-Tolerant Computing Systems," Proc. FTCS-20, pp. 298-306, June 1990.
[2] S.M. Shatz, J.P. Wang, and M. Goto, “Task Allocation for Maximizing Reliability of Distributed Computer Systems,” IEEE Trans. Computers, vol. 41, no. 9, pp. 1,156-1,168, Sept. 1992.
[3] J.A. Bannister and K.S. Trivedi, "Task Allocation in Fault-Tolerant Distributed Systems," Acata Informatica, vol. 20, pp. 261-281, 1983.
[4] R. Kazman, "Tool Support for Architectural Analysis and Design," Joint Proc. SIGSOFT '96 Workshops, ACM Press, New York, pp. 94-97.
[5] D.P. Siewiorek and R.S. Swartz, Reliable System Design: The Theory and Practice.New York: Digital Press, 1992.
[6] N. Speirs and P. Barrett, "Using Passive Replicates in Delta-4 to Provide Dependable Distributed Computing," Proc. FTCS-19, pp. 184-190, June 1989.
[7] A. Nangia and D. Finkel, "Transaction-Based Fault-Tolerant Computing in Distributed Systems," Proc. 1992 Workshop Fault-Tolerant Parallel and Distributed Systems, pp. 92-97, July 1992.
[8] R. Davoli, L.-A. Giachini, Ö. Babaoglu, A. Amoroso, and L. Alvisi, "Parallel Computing in Networks of Workstations with Paralex," IEEE Trans. Parallel and Distributed Systems, vol. 7, no. 4, pp. 371-384, Apr. 1996.
[9] H. Lee, J. Kim, S. Hong et al., "Fault-Tolerant Process Allocation with Load Balancing," Proc. 1995 Pacific Rim Int'l Symp. Fault-Tolerant Systems, pp. 124-129, Dec. 1995.
[10] H.P. Williams, Model Building in Mathematical Programming, second edition. John Wiley&Sons Ltd., 1985.
[11] J. Kim, H. Lee, and S. Lee, "Load Balancing Process Allocation in Fault-Tolerant Multicomputers," Technical Report CS-95-001, Pohang Univ. of Science and Tech nology, 1995.
[12] J. Kim, H. Lee, and S. Lee, "Process Allocation for Load Distribution in Fault-Tolerant Multicomputers," Proc. FTCS-25, pp. 174-183, June 1995.

Index Terms:
Backup process, checkpointing, fault-tolerant multicomputer, load balancing, process allocation.
Jong Kim, Heejo Lee, Sunggu Lee, "Replicated Process Allocation for Load Distribution in Fault-Tolerant Multicomputers," IEEE Transactions on Computers, vol. 46, no. 4, pp. 499-505, April 1997, doi:10.1109/12.588067
Usage of this product signifies your acceptance of the Terms of Use.