This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Hierarchical Modeling of Availability in Distributed Systems
January 1995 (vol. 21 no. 1)
pp. 50-58
Distributed computing systems are attractive due to the potential improvement in availability, fault-tolerance, performance, and resource sharing. Modeling and evaluation of such computing systems is an important step in the design process of distributed systems. In this paper, we present a two-level hierarchical model to analyze the availability of distributed systems. At the higher level (user level), the availability of the tasks (processes) is analyzed using a graph-based approach. At the lower level (component level), detailed Markov models are developed to analyze the component availabilities. These models take into account the hardware/software failures, congestion and collisions in communication links, allocation of resources, and the redundancy level. A systematic approach is developed to apply the two-level hierarchical model to evaluate the availability of the processes and the services provided by a distributed computing environment. This approach is then applied to analyze some of the distributed processes of a real distributed system, Unified Workstation Environment (UWE), that is currently being implemented at AT&T Bell Laboratories.

[1] K. K. Aggarwal and S. Rai“Reliability evaluation in computer communication networks,”IEEE Trans. Reliability, vol. R-30, pp. 32–35, Apr. 1981.
[2] M. O. Ball,“Computational complexity of network reliability analysis: An overview,”IEEE Trans. Reliability,vol. R-35, pp. 230–239, Aug. 1986.
[3] K. Barkaoui, G. Florin, C. Fraize, B. Lemaire, and S. Natkin,“Reliability analysis of non repairable systems using stochastic Petri nets,”inProc. 18th Int. Symp. Fault-Tolerant Comput., June 1988, pp. 90–95.
[4] E. Conway and A. Goyal,“Monte Carlo simulation of computer system availability/reliability models,”inProc. 17th Int. Symp. Fault-Tolerant Comput., June 1987.
[5] J. B. Dugan, K. S. Trivedi, R. M. Geist, and V. F. Nicola,“Extended stochastic Petri nets: Applications and analysis,”inPerformance '84, E. Gelenbe, Ed. Amsterdam, The Netherlands: North-Holland, 1984, pp. 507–519.
[6] J. B. Dugan, S. Babuso, and M. Boyd,“Dynamic fault-tree models for fault-tolerant computer systems,”IEEE Trans. Reliability,vol. R-41, pp. 363–377, Sept. 1992.
[7] G. Fishman,“System reliability: Estimation, sensitivity and parameter errors,”inComputer Performance and Reliability,Iazeolla, P. Coutois, and O. Boxma, Eds. New York: North-Holland, 1988.
[8] S. Hariri, C. S. Raghavendra, "SYREL: A symbolic reliability algorithm based on path and cutset methods,”IEEE Trans. Comput., vol. C-36, pp. 1224–1232, Oct. 1987.
[9] C. Hwang, F. Tillman, and M. Lee,“System reliability evaluation techniques for complex large systems—A review,”IEEE Trans. Reliability,vol. R-30, pp. 411–423, Dec. 1981.
[10] A.M. Johnson Jr. and M. Malek, "Survey of Software Tools for Evaluating Reliability, Availability, and Serviceability," ACM Computing Surveys, vol. 20, no. 4, pp. 227-269, Dec. 1988.
[11] V. K. P. Kumar, S. Hariri, and C. S. Raghavendra,“Distributed program reliability analysis,”IEEE Trans. Software Eng., vol. SE-12, pp. 42–50, Jan. 1986.
[12] C. S. Raghavendra, V. K. P. Kumar, and S. Hariri,“Reliability Analysis in Distributed Systems,”IEEE Trans. Comput.,vol. 37, pp. 352–358, Mar. 1988.
[13] S. Rai and D. P. Agrawal,“Advances in distributed system reliability,”Tutorial Text, IEEE Computer Society Press, 1990.
[14] A. Reibman and M. Veeraraghavan,“Reliability modeling: An overview for system designers,”IEEE Comput.,pp. 49–57, Apr. 1991.
[15] S. Sahner and K. Trivedi,“A hierarchical, combinatorial-markov method of solving complex reliability models,”inProc. 1986 Fall Joint Comput. Conf.,pp. 817–825, Nov. 1986.
[16] ——,“Reliability modeling using SHARPE,”IEEE Trans. Reliability, vol. R-36, pp. 186–193, June 1987.
[17] A. Satyanarayana and M. Chang,“Network reliability and factoring theorem,”Networks,vol. 13, pp. 107–120, 1983.
[18] J. Toerrey,“A pruned tree approach to reliability computation,”IEEE Trans. Reliability,vol. R-32, pp. 170–174, June 1983.

Index Terms:
Availability, reliability, task availability, distributed system availability modeling, hierarchical availability modeling, task availability Optimization.
Citation:
Salim Hariri, Hasan Mutlu, "Hierarchical Modeling of Availability in Distributed Systems," IEEE Transactions on Software Engineering, vol. 21, no. 1, pp. 50-58, Jan. 1995, doi:10.1109/32.341847
Usage of this product signifies your acceptance of the Terms of Use.