This Article 
 Bibliographic References 
 Add to: 
Reliability Models for Fault-Tolerant Private Network Applications
September 1994 (vol. 43 no. 9)
pp. 1039-1053

A private or corporate network connects the offices of a single large organization, such as an airline or a bank, using leased private lines. To improve the reliability of network applications, fault-tolerance can be incorporated directly into the private network. In this paper, we use a state-space model to capture the effect of dynamic rerouting and repair and investigate the effect on reliability of different repair-and-rerouting strategies at the application or call level. To reduce the potentially large state space that results, we construct an approximate Markov model with a smaller state space by lumping together similar states. The lumped model includes coverage parameters that can be estimated without considering the original model in its entirety. This allows the state-space model to be solved accurately and efficiently. We compare results of the approximation technique with results obtained by a complete simulation of the original network. We expect similar approximation techniques to be effective on models with large state spaces which obtain processes with many time-scales.

[1] S. Abraham and K. Padmanabhan, "Reliability of the hypercube," inProc. 12th Int. Conf. Parallel Processing, 1988.
[2] J. Arlat and J. C. Laprie, "Performance-related dependability evaluation of supercomputer systems," inProc. 13th Fault-Tolerant Computing Symp., 1983.
[3] B. E. Aupperle and J. F. Meyer, "State space generation for degradable systems," inProc. 21st Fault-Tolerant Computing Symp., 1991.
[4] M. Balakrishnan and A. Reibman, "Characterizing a lumping-heuristic for a Markov network reliability model," inProc. 23rd Fault-Tolerant Computing Symp., 1993.
[5] B. A. Coan, W. E. Leland, M. P. Vecchi, A. Weinrib, and L. T. Wu, "Using distributed topology update and preplanned configurations to achieve trunk network survivability,"IEEE Trans, Reliability, vol. 40, no. 4, Oct. 1991.
[6] C. J. Colbourn,The Combinatorics of Network Reliability. Oxford, U.K.: Oxford University Press, 1987.
[7] P.-J. Courtois,Decomposability: Queueing and Computer System Applications. New York: Academic Press, 1977.
[8] J. B. Dugan and K. S. Trivedi, "Coverage modeling for dependability analysis of fault-tolerant systems,"IEEE Trans. Comput., vol. 38, no. 6, Aug. 1989.
[9] A. Girard,Routing and Dimensioning in Circuit-Switched Networks. New York: Addison-Wesley, 1991.
[10] W. D. Grover, "The self-healing network: A fast distributed restoration technique for networks using digital cross connect machines," inProc. IEEE Globecom, 1987, pp. 28.2.1-28.2.6.
[11] D. P. Heyman and M. J. Sobel,Stochastic Models in Operations Research, vol. II. New York, McGraw Hill, Inc., 1984.
[12] J. Y. Hui, M. B. Gursoy, N. Moayeri, and R. D. Yates, "A layered broadband switching architecture with physical and virtual path configurations,"IEEE J. Select. Areas Commun., vol. 9, Dec. 1991.
[13] J. Kemeny and L. J. Snell,Finite Markov Chains. Berlin, Springer-Verlag, 1976.
[14] J. McGough, M. Smotherman, and K. S. Trivedi, "The conservativeness of reliability estimates based on instantaneous coverage,"IEEE Trans. Comput., vol. C-34, no. 7, July 1985.
[15] J. Meunier, J. Hopkins, S. Leepard, and M. Koch, "Network planning tools for flexible, survivable networks," inISS '87, 1987, pp. C6.1.1-C6.1.8.
[16] W. Najjar and J. L. Gaudiot, "Reliability and performance modeling of hypercube-based multiprocessors," inProc. 2nd Int. Workshop Appl. Math. and Performance/Reliability Models of Compu./Commun. Syst., Rome, Italy, 1987.
[17] Rai, S., and D.P. Agrawal,Advances in Distributed System Reliability, IEEE Computer Society Press, Los Alamitos, Calif., Order No. 1907, 1990.
[18] S. Rai and D. P. Agrawal,Distributed Computing Network Reliability. Los Alamitos, CA: IEEE Computer Society Press, 1990.
[19] A. Reibman, R. Smith, and K. Trivedi, "Markov and Markov reward model transient analysis: An overview of numerical approaches,"European J. Operat. Res., vol. 40, 1989.
[20] A. Reibman and H. Zaretsky, "Modeling fault coverage and reliability in a fault-tolerant network," inGlobecom'90, 1990.
[21] R. A. Sahner and K. S. Trivedi, "Reliability modeling using SHARPE,"IEEE Trans. Reliability, vol. R-36, no. 2, June 1987.
[22] C. A. Siller, Jr., "The evolution of private networks and their relationship to public carrier services,"IEEE Commun. Mag., vol. 30, no. 3, Mar. 1992.
[23] W. E. Smith and K. S. Trivedi, "Dependability evaluation of a class of multi-loop topologies for local area networks,"IBM J. Res. Develop., vol. 33, no. 5, pp. 511-523, Sept. 1989.
[24] W. N. Toy, "Fault-tolerant design of local ESS processors,"Proc. IEEE, vol. 66, no. 10, Oct. 1978.
[25] IEEE Trans. Reliability: Special Issue on Reliability of Parallel and Distributed Computing Networks, vol. 38, no. I, Apr. 1989.
[26] IEEE Trans. Reliability: Special Issue on Telecommunication Systems and Services, vol. 40, no. 4, Oct. 1991.
[27] P. D. Welch, "The statistical analysis of simulation results," inComputer Performance Modeling Handbook, S. S. Lavenberg, Ed. New York: Academic Press, 1983.

Index Terms:
fault tolerant computing; reliability; Markov processes; state-space methods; system recovery; telephone networks; telecommunication network routing; lumped parameter networks; parameter estimation; reliability models; fault-tolerant private network applications; corporate network; leased private lines; reliability; state-space model; dynamic rerouting; repair strategies; application level; call level; approximate Markov model; lumped model; coverage parameter estimation; network simulation; time-scales.
M. Balakrishnan, A. Reibman, "Reliability Models for Fault-Tolerant Private Network Applications," IEEE Transactions on Computers, vol. 43, no. 9, pp. 1039-1053, Sept. 1994, doi:10.1109/12.312113
Usage of this product signifies your acceptance of the Terms of Use.