This Article 
 Bibliographic References 
 Add to: 
Reconfiguration Models and Algorithms for Stateful Interactive Processes
May/June 1999 (vol. 25 no. 3)
pp. 401-415

Abstract—In this paper, we present new results in the area of reconfiguration of stateful interactive processes in the presence of faults. More precisely, we consider a set of servers/processes that have the same functionality, i.e., are able to perform the same tasks and provide the same set of services to their clients. In the case when several of them turn out to be faulty, we want to reconfigure the system so that the clients of the faulty servers/processes are served by some other, fault-free, servers of the system in a way that is transparent to all the system clients. We propose a new method for reconfiguring in the presence of faults: compensation paths. Compensation paths are an efficient way of shifting spare resources from where they are available to where they are needed. We also present optimal and suboptimal simple reconfiguration algorithms of low polynomial time complexity O(nmlog(n2/m)) for the optimal and O(m) for the suboptimal algorithms, where n is the number of processes and m is the number of primary-backup relationships. The optimal algorithms compute the way to reconfigure the system whenever the reconfiguration is possible. The suboptimal algorithms may sometimes fail to reconfigure the system, although reconfiguration would be possible by using the optimal centralized algorithms. However, suboptimal algorithms have other competitive advantages over the centralized optimal algorithms with regard to time complexity and communication overhead.

[1] L. Alvisi and K. Marzullo, "Deriving Optimal Checkpoint Protocols for Distributed Shared Memory Architectures," Theory and Practice in Distributed Systems, pp. 111-121, Sept. 1994.
[2] E. Ayanoglu, "A Fast Topology Update Algorithm for Restoration Under Multiple Failures in Broadband Networks," Proc. IEEE Int'l Conf. Comm.,Geneva, May 1993.
[3] M. Baker and M. Sullivan, "The Recovery Box: Using Fast Recovery to Provide High Availability in the UNIX Environment," Proc. USENIX'92, pp. 31-43, June 1992.
[4] J. Bartlett, “A NonStop Kernel,” Proc. ACM Symp.Operating Systems Principles, ACM Press, New York, 1981, pp. 22‐29.
[5] P. Bernstein, V. Hadzilacos, and N. Goodman, Concurrency Control and Recovery in Database Systems. Addison-Wesley, 1987.
[6] K.P. Birman and R. Van Renesse, Reliable Distributed Computing with the Isis Toolkit. IEEE CS Press, 1994.
[7] K.P. Birman and T.A. Joseph, "Communication Support for Reliable Distributed Computing," Fault-Tolerant Distributed Computing, pp. 124-137, Springer-Verlag, 1987.
[8] K.P. Birman and T.A. Joseph, "Exploiting Replication in Distributed Systems," Distributed Systems, S. Mullender ed., pp. 319-367. ACM Press, 1989.
[9] A. Borg, W. Blau, W. Graetsch, F. Herrmann, and W. Oberle, "Fault Tolerance Under UNIX," ACM Trans. Computer Systems, vol. 7, no. 1, pp. 1-24, Feb. 1989.
[10] E.A. Dinits, "Algorithm for Solution of a Problem of Max Flow in a Network with Power Estimation," Soviet Math Dokl., vol. 11, pp. 1,277-1,280, 1990.
[11] A. Goldberg and R. Tarjan, "Finding Minimum-Cost Circulations by Successive Approximations," Technical Report MIT/LCS/TM-333, MIT, Laboratory of Computer Science, MIT, 1987.
[12] A. Goldberg and R. Tarjan, "A New Approach to the Maximum Flow Problem," J. ACM, vol. 35, pp. :921-940, 1988.
[13] Y. Huang and P. Jalote, "Effect of Fault Tolerance on Response Time—Analysis of the Primary Site Approach," IEEE Trans. Computers, vol. 41, no. 4, pp. 420-428, Apr. 1992.
[14] Y. Huang and C. Kintala, "Software Implemented Fault Tolerance: Technologies and Experience," Proc. FTCS'93, June 1993.
[15] P. Jalote, Fault Tolerance in Distributed Systems. Prentice Hall, 1994.
[16] B.W. Johnson, Design and Analysis of Fault-Tolerant Digital Systems, pp. 394-402. Reading, Mass.: Addison-Wesley, June 1989.
[17] R. Kawamura, H. Hadama, and K. Sato, "Self-Healing Techniques Utilizing Virtual Path Concept for ATM Networks," Trans. Inst. Electronics, Information and Communication Engineers of Japan, J74-B-I:537-546, July 1991.
[18] R. Kawamura, K. Sato, and I. Tokizawa, Self-Healing ATM Network Architectures Utilizing Virtual Paths," Proc. Networks'92,Kobe, Japan, May 1992.
[19] S.Y. Kung, S.N. Jean, and C.W. Chang, "Fault-Tolerant Array Processors Using Single-Track Switches," IEEE Trans. Computers, vol. 38, no. 4, pp. 501-514, Apr. 1989.
[20] B. Liskov, S. Ghemawat, R. Gruber, P. Johnson, L. Shrira, and M. Williams, "Replication in the Harp file System," Proc. 13th SOSP, pp. 226-238, Oct. 1991.
[21] M. Litzkow, M. Livny, and M.W. Mutka, “Condor—A Hunter of Idle Workstations,” Proc. Eighth Int'l Conf. Distributed Computing Systems, Jun. 1988.
[22] J. Long, W.K. Fuchs, and J.A. Abraham, "Compiler-Assisted Static Checkpoint Insertion," Proc. FTC'92, pp. 58-65, July 1992.
[23] B.M. Oki and B. Liskov, "Viewstamped Replication: A New Primary Copy Method to Support Highly Available Distributed Systems," Proc. Seventh ACM Symp. Principles Distributed Computing, pp. 8-17, Aug. 1988.
[24] J.-F. Paris, "Voting with Witnesses: A Consistency Scheme for Replicated Data," Proc. Sixth IEEE Int'l Conf. Distributed Computer Systems, pp. 606-612, 1986.
[25] T.A. Varvarigou, V.P. Roychowdhury, and T. Kailath, "Reconfiguring Arrays Using Multiple-Track Models: The 3-Track-1-Spare Approach," IEEE Trans. Computers, vol. 42, no. 11, Nov. 1993.

Index Terms:
Reconfiguration of stateful processes, compensation paths, polynomial-time algorithms, optimal centralized algorithms, suboptimal distributed and hybrid algorithms.
Theodora A. Varvarigou, Miltiadis E. Anagnostou, Sudhir R. Ahuja, "Reconfiguration Models and Algorithms for Stateful Interactive Processes," IEEE Transactions on Software Engineering, vol. 25, no. 3, pp. 401-415, May-June 1999, doi:10.1109/32.798328
Usage of this product signifies your acceptance of the Terms of Use.