This Article 
 Bibliographic References 
 Add to: 
Dependability Modeling and Analysis of Distributed Programs
May 1994 (vol. 20 no. 5)
pp. 345-352

Presents a modeling approach based on stochastic Petri nets to estimate the reliability and availability of programs in a distributed computing system environment. In this environment, successful execution of programs is conditioned on the successful access of related files distributed throughout the system. The use of stochastic Petri nets is demonstrated by extending a basic reliability model to account for repair actions when faults occur. To this end, two possible models are discussed: the global repair model, which assumes a centralized repair team that restores the system to its original status when a failure state is reached, and the local repair model, which assumes that repairs are localized to the node where they occur. The former model is useful in evaluating the availability of programs (or the availability of the hardware support) subject to hardware faults that are repaired globally; therefore, the programs of interest can be interrupted. On the other hand, the latter model can be used to evaluate program reliability in the presence of hardware faults subject to repair, without interrupting the normal operation of the system.

[1] A. Grnarov and M. Gerla, "Multiterminal reliability analysis of distributed processing systems," inProc. 1981 Int. Conf. on Parallel Processing1981, pp. 79-86.
[2] V. K. Prasanna Kumar, S. Hariri, and C. S. Raghavendra, "Distributed program reliability analysis,"IEEE Trans. Software Eng., pp. 42-50, Jan. 1986.
[3] S. Rai and D. P. Agrawal,Distributed Computing Network Reliability. Los Alamitos, CA: IEEE Computer Society Press, 1990.
[4] Rai, S., and D.P. Agrawal,Advances in Distributed System Reliability, IEEE Computer Society Press, Los Alamitos, Calif., Order No. 1907, 1990.
[5] J. A. Abraham, "An improved algorithm for network reliability,"IEEE Trans. Reliability, vol. V R-28, no. 1, pp. 58-61, Apr. 1979.
[6] R. G. Bennetts, "Analysis of reliability block diagrams by Boolean techniques,"IEEE Trans. Reliability, vol. V R-31, no. 2, pp. 159-166, June 1982.
[7] Y. K. Dalal, "A distributed algorithm for constructing minimal spanning trees,"IEEE Trans. Software Eng., vol. V SE-13, no. 3, pp. 398-405, Mar. 1987.
[8] S. Hariri and C.S. Raghavendra, "SYREL: A Symbolic Reliability Algorithm Based on Path and Cutset Methods,"IEEE Trans. Computers, Vol. C-36, No. 10, Oct. 1987, pp. 1,224-1,232.
[9] A.M. Johnson and M. Malek, "Survey of Software Tools for Evaluating Reliability, Availability, and Serviceability,"ACM Computing Surveys, Vol. 20, No. 4, Dec. 1988, pp. 227-269.
[10] G. Ciardo, J. Muppala, and K. Trivedi, "SPNP: Stochastic Petri Net Package,"Proc. Third Int'l Workshop Petri Nets and Performance Models, CS Press, Los Alamitos, Calif., Order No. 2001, 1989, pp. 142-151.
[11] J. L. Peterson,Petri Net Theory and the Modeling of Systems. Englewood Cliffs, NJ: Prentice-Hall, 1981.
[12] M. K. Molloy, "Performance analysis using stochastic Petri nets,"IEEE Trans. Comput., vol. C-39 no. 9, pp. 913-917, Sept. 1982.
[13] M. Ajmone Marsan, G. Balbo, and G. Conte, "A class of generalized stochastic Petri nets for the performance evaluation of multiprocessor systems,"ACM Trans. Comput. Syst., vol. 2, pp. 93-122, May 1984.
[14] G. Ciardo and J. K. Muppala,Manual for the SPNP Package," Version 3.0, Dept. of Comput. Sci., Duke Univ., Durham, NC, 1990.
[15] D. Chen, "Multiterminal reliability using stochastic Petri nets," M. S. thesis, Dept. of Elec. Eng., Louisiana Tech Univ., Baton Rouge, LA, 1991.

Index Terms:
distributed algorithms; Petri nets; program diagnostics; software reliability; system recovery; stochastic processes; programming theory; multiprocessing programs; dependability modeling; dependability analysis; distributed programs; stochastic Petri nets; program reliability; program availability; distributed computing system environment; program execution; repair actions; global repair mode; centralized repair team; system status restoration; failure state; local repair model; hardware support; hardware faults; program interruption; file distribution
N. Lopez-Benitez, "Dependability Modeling and Analysis of Distributed Programs," IEEE Transactions on Software Engineering, vol. 20, no. 5, pp. 345-352, May 1994, doi:10.1109/32.286421
Usage of this product signifies your acceptance of the Terms of Use.