This Article 
 Bibliographic References 
 Add to: 
Modeling of Hierarchical Distributed Systems with Fault-Tolerance
April 1990 (vol. 16 no. 4)
pp. 444-457

Since each of the levels in a hierarchical system could have various characteristics, different fault-tolerant schemes could be appropriate at different levels. A stochastic Petri net (SPN) is used to investigate various fault-tolerant schemes in this context. The basic SPN is augmented by parameterized subnet primitives to model the fault-tolerant schemes. Both centralized and distributed fault-tolerant schemes are considered. The two schemes are investigated by considering the individual levels in a hierarchical system independently. In the case of distributed fault tolerance, two different checkpointing strategies are considered. The first scheme is called the arbitrary checkpointing strategy. Each process in this scheme does its checkpointing independently; thus, the domino effect may occur. The second scheme is called the planned strategy. Here, process checkpointing is constrained to ensure no domino effect. The results show that, under certain conditions, an arbitrary checkpointing strategy can perform better than a planned strategy. The effect of integration on the fault-tolerant strategies of the various levels of a hierarchy are studied.

[1] T. Albert and R. Charles, "A proposed hierarchical model for automated manufacturing systems,"J. Manufacturing Syst., vol. 5, no. 1, pp. 15-25, 1986.
[2] P. Chintamaneni, P. Jalote, Y. Shieh, and S. Tripathi, "On fault tolerance in manufacturing systems,"IEEE Network, vol. 2, pp. 32- 39, May 1988.
[3] Y. Shieh, D. Ghosal, P. Chintamaneni, and S. Tripathi, "Application of Petri net models for the evaluation of fault-tolerant techniques in distributed systems," inProc. 9th Annu. Int. Conf. Distributed Computing Systems, June 1989.
[4] B. Randell, "System structure for software fault tolerance,"IEEE Trans. Software Eng., vol. SE-1, pp. 220-232, June 1975.
[5] L. Chen and A. Avizienis, "N-version programming: A fault tolerance approach to reliability of software operation," inDig. 8th Annu. Int. Conf. Fault-Tolerant Comput., FTCS-8, June 1978, pp. 3-9.
[6] R. Koo and S. Toueg, "Checkpointing and rollback-recovery for distributed systems,"IEEE Trans. Software Eng., vol. SE-13, pp. 23-31, Jan. 1987.
[7] T. Anderson and J. Knight, "A framework for software fault tolerance in real-time systems,"IEEE Trans. Software Eng., vol. SE-9, pp. 355-364, May 1983.
[8] G. Balbo, S. Bruell, and S. Ghanta, "Combining queueing network and generalized stochastic Petri net models for the analysis of some software blocking phenomena,"IEEE Trans. Software Eng., vol. SE-12, no. 4, pp. 561-576, 1986.
[9] G. Balbo, S. Bruell, and S. Ghanta, "Combining queueing network and generalized stochastic petri net models for the solution of complex models of system behavior,"IEEE Trans. Comput., vol. 37, pp. 1251-1268, Oct. 1988.
[10] A. M. Tyrrell and D. J. Holding, "Design of reliable software in distributed systems using the conversation scheme,"IEEE Trans. Software Eng., vol. SE-12, no. 9, pp. 921-928, Sept. 1986.
[11] N.G. Leveson and J.L. Stolzy, "Safety analysis using Petri nets,"IEEE Trans. Software Eng., vol. SE-13, no. 3, pp. 386-397, Mar. 1987.
[12] A. Adiga and S. Deshpande, "Evaluation of effectiveness of circuit based and packet based interconnection networks via Petri-net models," Univ. Texas at Austin, Tech. Rep., Jan. 1987.
[13] M. K. Vernon and M. A. Holliday, "Performance analysis of multiprocessor cache consistency protocols using generalized timed petri nets," inProc. Performance '86 and ACM Sigmetrics 1986, Raleigh, NC, May 1986, pp. 9-17.
[14] G. Peterka and T. Murata, "Proof procedure and answer extraction in Petri net model of logic programs,"IEEE Trans. Software Eng., vol, 15, pp. 209-217, Feb. 1989.
[15] J. Dugan and G. Ciardo, "Stochastic Petri net analysis of a replicated file systems,"IEEE Trans. Software Eng., vol. 15, pp. 394-401, Apr. 1989.
[16] P. J. Haas and G. S. Shedler, "Stochastic Petri net representation of discrete event simulations,"IEEE Trans. Software Eng., vol. 15, pp. 381-393, Apr. 1989.
[17] Y.-B. Shieh, D. Ghosal, and S. Tripathi, "Modeling of fault-tolerant techniques in hierarchical systems," inProc. FTCS-19, Chicago, IL, 1989.
[18] M. Molloy, "Performance analysis using stochastic Petri nets,"IEEE Trans. Comput., vol. C-31, pp. 913-917, Sept. 1982.
[19] M. Ajmone Marsan, G. Balbo, and G. Conte, "A class of generalized stochastic Petri nets for the performance evaluation of multiprocessor systems,"ACM Trans. Comput. Syst., vol. 2, pp. 93-122, May 1984.
[20] R. Nelson, L. Haibt, and P. Sheridan, "Casting Petri nets into program,"IEEE Trans. Software Eng., vol. SE-9, pp. 590-602, Sept. 1983.
[21] G. Chiola, "A software package for the analysis of Generalized Stochastic Petri Net models," inProc. Int. Workshop Timed Petri Nets, July 1985.
[22] Ada Reference Manual, ANSI/MIL-STD 1815A, 1983.
[23] C. A. R. Hoare, "Communicating sequential processes,"Commun. ACM, vol. 21, pp. 666-677, 1978.
[24] D. Peng and K. G. Shin, "Modeling of concurrent task execution in a distributed system for real-time control,"IEEE Trans. Comput., vol. C-36, no. 4, Apr. 1987.
[25] E. Gelenbe and D. Derochette, "Performance of rollback recovery systems under intermittent failures,"Commun. ACM, vol. 21, no. 6, pp. 493-499, 1978.
[26] M. L. Powell and D. L. Presotto, "Publishing: A reliable broadcast communication mechanism," inProc. 9th ACM Symp. Operat. Syst. Principles, Oct. 1983, pp. 100-109.
[27] A. Borg, J. Baumbach, and S. Glazer, "A Message System Supporting Fault Tolerance,"Proc. Ninth Symp. on Operating System Principles, 1983, pp. 90-99.
[28] K. Shin and Y. Lec, "Evaluation of error recovery blocks used for cooperating processes,"IEEE Trans. Software Eng., vol. SE-10, pp. 692-700, Nov. 1984.

Index Terms:
hierarchical distributed systems modelling; fault-tolerance; stochastic Petri net; parameterized subnet primitives; centralized; checkpointing strategies; arbitrary checkpointing strategy; planned strategy; distributed processing; fault tolerant computing; Petri nets.
Y.-B. Shieh, D. Ghosal, P.R. Chintamaneni, S.K. Tripathi, "Modeling of Hierarchical Distributed Systems with Fault-Tolerance," IEEE Transactions on Software Engineering, vol. 16, no. 4, pp. 444-457, April 1990, doi:10.1109/32.54296
Usage of this product signifies your acceptance of the Terms of Use.