This Article 
 Bibliographic References 
 Add to: 
Hierarchical Modeling of Availability in Distributed Systems
January 1995 (vol. 21 no. 1)
pp. 50-58
Distributed computing systems are attractive due to the potential improvement in availability, fault-tolerance, performance, and resource sharing. Modeling and evaluation of such computing systems is an important step in the design process of distributed systems. In this paper, we present a two-level hierarchical model to analyze the availability of distributed systems. At the higher level (user level), the availability of the tasks (processes) is analyzed using a graph-based approach. At the lower level (component level), detailed Markov models are developed to analyze the component availabilities. These models take into account the hardware/software failures, congestion and collisions in communication links, allocation of resources, and the redundancy level. A systematic approach is developed to apply the two-level hierarchical model to evaluate the availability of the processes and the services provided by a distributed computing environment. This approach is then applied to analyze some of the distributed processes of a real distributed system, Unified Workstation Environment (UWE), that is currently being implemented at AT&T Bell Laboratories.

[1] K. K. Aggarwal and S. Rai“Reliability evaluation in computer communication networks,”IEEE Trans. Reliability, vol. R-30, pp. 32–35, Apr. 1981.
[2] M. O. Ball,“Computational complexity of network reliability analysis: An overview,”IEEE Trans. Reliability,vol. R-35, pp. 230–239, Aug. 1986.
[3] K. Barkaoui, G. Florin, C. Fraize, B. Lemaire, and S. Natkin,“Reliability analysis of non repairable systems using stochastic Petri nets,”inProc. 18th Int. Symp. Fault-Tolerant Comput., June 1988, pp. 90–95.
[4] E. Conway and A. Goyal,“Monte Carlo simulation of computer system availability/reliability models,”inProc. 17th Int. Symp. Fault-Tolerant Comput., June 1987.
[5] J. B. Dugan, K. S. Trivedi, R. M. Geist, and V. F. Nicola,“Extended stochastic Petri nets: Applications and analysis,”inPerformance '84, E. Gelenbe, Ed. Amsterdam, The Netherlands: North-Holland, 1984, pp. 507–519.
[6] J. B. Dugan, S. Babuso, and M. Boyd,“Dynamic fault-tree models for fault-tolerant computer systems,”IEEE Trans. Reliability,vol. R-41, pp. 363–377, Sept. 1992.
[7] G. Fishman,“System reliability: Estimation, sensitivity and parameter errors,”inComputer Performance and Reliability,Iazeolla, P. Coutois, and O. Boxma, Eds. New York: North-Holland, 1988.
[8] S. Hariri, C. S. Raghavendra, "SYREL: A symbolic reliability algorithm based on path and cutset methods,”IEEE Trans. Comput., vol. C-36, pp. 1224–1232, Oct. 1987.
[9] C. Hwang, F. Tillman, and M. Lee,“System reliability evaluation techniques for complex large systems—A review,”IEEE Trans. Reliability,vol. R-30, pp. 411–423, Dec. 1981.
[10] A.M. Johnson Jr. and M. Malek, "Survey of Software Tools for Evaluating Reliability, Availability, and Serviceability," ACM Computing Surveys, vol. 20, no. 4, pp. 227-269, Dec. 1988.
[11] V. K. P. Kumar, S. Hariri, and C. S. Raghavendra,“Distributed program reliability analysis,”IEEE Trans. Software Eng., vol. SE-12, pp. 42–50, Jan. 1986.
[12] C. S. Raghavendra, V. K. P. Kumar, and S. Hariri,“Reliability Analysis in Distributed Systems,”IEEE Trans. Comput.,vol. 37, pp. 352–358, Mar. 1988.
[13] S. Rai and D. P. Agrawal,“Advances in distributed system reliability,”Tutorial Text, IEEE Computer Society Press, 1990.
[14] A. Reibman and M. Veeraraghavan,“Reliability modeling: An overview for system designers,”IEEE Comput.,pp. 49–57, Apr. 1991.
[15] S. Sahner and K. Trivedi,“A hierarchical, combinatorial-markov method of solving complex reliability models,”inProc. 1986 Fall Joint Comput. Conf.,pp. 817–825, Nov. 1986.
[16] ——,“Reliability modeling using SHARPE,”IEEE Trans. Reliability, vol. R-36, pp. 186–193, June 1987.
[17] A. Satyanarayana and M. Chang,“Network reliability and factoring theorem,”Networks,vol. 13, pp. 107–120, 1983.
[18] J. Toerrey,“A pruned tree approach to reliability computation,”IEEE Trans. Reliability,vol. R-32, pp. 170–174, June 1983.

Index Terms:
Availability, reliability, task availability, distributed system availability modeling, hierarchical availability modeling, task availability Optimization.
Salim Hariri, Hasan Mutlu, "Hierarchical Modeling of Availability in Distributed Systems," IEEE Transactions on Software Engineering, vol. 21, no. 1, pp. 50-58, Jan. 1995, doi:10.1109/32.341847
Usage of this product signifies your acceptance of the Terms of Use.