Issue No. 05 - May (1994 vol. 20)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/32.286421
<p>Presents a modeling approach based on stochastic Petri nets to estimate the reliability and availability of programs in a distributed computing system environment. In this environment, successful execution of programs is conditioned on the successful access of related files distributed throughout the system. The use of stochastic Petri nets is demonstrated by extending a basic reliability model to account for repair actions when faults occur. To this end, two possible models are discussed: the global repair model, which assumes a centralized repair team that restores the system to its original status when a failure state is reached, and the local repair model, which assumes that repairs are localized to the node where they occur. The former model is useful in evaluating the availability of programs (or the availability of the hardware support) subject to hardware faults that are repaired globally; therefore, the programs of interest can be interrupted. On the other hand, the latter model can be used to evaluate program reliability in the presence of hardware faults subject to repair, without interrupting the normal operation of the system.</p>
distributed algorithms; Petri nets; program diagnostics; software reliability; system recovery; stochastic processes; programming theory; multiprocessing programs; dependability modeling; dependability analysis; distributed programs; stochastic Petri nets; program reliability; program availability; distributed computing system environment; program execution; repair actions; global repair mode; centralized repair team; system status restoration; failure state; local repair model; hardware support; hardware faults; program interruption; file distribution
N. Lopez-Benitez, "Dependability Modeling and Analysis of Distributed Programs," in IEEE Transactions on Software Engineering, vol. 20, no. , pp. 345-352, 1994.