
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
D. Tang, R.K. Iyer, "Analysis and Modeling of Correlated Failures in Multicomputer Systems," IEEE Transactions on Computers, vol. 41, no. 5, pp. 567577, May, 1992.  
BibTex  x  
@article{ 10.1109/12.142683, author = {D. Tang and R.K. Iyer}, title = {Analysis and Modeling of Correlated Failures in Multicomputer Systems}, journal ={IEEE Transactions on Computers}, volume = {41}, number = {5}, issn = {00189340}, year = {1992}, pages = {567577}, doi = {http://doi.ieeecomputersociety.org/10.1109/12.142683}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Computers TI  Analysis and Modeling of Correlated Failures in Multicomputer Systems IS  5 SN  00189340 SP567 EP577 EPD  567577 A1  D. Tang, A1  R.K. Iyer, PY  1992 KW  correlated failures; multicomputer systems; DEC VAXcluster; dependability; shared resources; cdependent model; pdependent model; computation theory; fault tolerant computing; multiprocessing systems. VL  41 JA  IEEE Transactions on Computers ER   
Based on the measurements from two DEC VAXcluster multicomputer systems, the issue of correlated failures is addressed. In particular, the characteristics of correlated failures, their impact and their modelling on dependability, are discussed. It is found from the data that most correlated failures are related to errors in shared resources and propagate from one machine to another. Comparisons between measurementbased models and analytical models that assume failure independence show that the impact of correlated failures on dependability is significant. Two validated models. the cdependent model and the pdependent model, are developed to evaluate the dependability of systems with correlated failures.
[1] T. F. Arnold, "The concept of coverage and its effect on the reliability model of a repairable system,"IEEE Trans. Comput., vol. C22, pp. 251154, Mar. 1973.
[2] B. E. Aupperle, J. F. Meyer, and L. Wei, "Evaluation of faulttolerant systems with nonhomogeneous workloads," inProc. 19th Int. Symp. FaultTolerant Comput., June 1989, pp. 159166.
[3] S. E. Butner and R. K. Iyer, "A statistical study of reliability and system load at SLAC," inProc. 10th Int. Symp. FaultTolerant Comput., Oct. 1980, pp. 207209.
[4] X. Castillo and D. P. Siewiorek, "Workload, performance, and reliability of digital computer systems," inProc. 11th Int. Symp. FaultTolerant Comput., June 1981, pp. 8489.
[5] X. Castillo and D. P. Siewiorek, "A workload dependent software reliability prediction model" inProc. 12th Int. Symp. FaultTolerant Comput., June 1982, pp. 279286.
[6] Digital Equipment Corporation,VAXcluster Systems Handbook, Apr. 1986.
[7] J. B. Dugan, "Correlated hardware failures in redundant systems," inProc. 2nd IFIP Working Conf. Dependable Comput. for Critical Appl., Tucson, AZ, Feb. 1991.
[8] J. Dunkel, "On the modeling of workloaddependent memory faults," inProc. 20th Int. Symp. FaultTolerant Comput., June 1990, pp. 348355.
[9] A. Goyalet al., "The system availability estimator," inProc. 16th Int. Symp. FaultTolerant Comput., June 1986, pp. 8489.
[10] A. J. Gross and V. A. Clark,Survival Distributions: Reliability Applications in the Biomedical Sciences, New York: Wiley, 1975.
[11] D. I. Heimann, N. Mittal, and K. S. Trivedi, "Availability and reliability modeling for computer systems,"Advances in Comput., vol. 31, pp. 175233, 1990.
[12] R. V. Hogg and E. A. Tanis,Probability and Statistical Inference, second ed. New York: Macmillan, 1983.
[13] M. C. Hsueh, R. K. Iyer, and K. S. Trivedi "Performability modeling based on real data: A case study,"IEEE Trans. Comput., vol. 37, pp. 478484, Apr. 1988.
[14] R. K. Iyer, S. E. Butner, and E. J. McCluskey, "A statistical failure/load relationship: Results of a multicomputer study,"IEEE Trans. Comput., vol. C31, pp. 697705, July 1982.
[15] C. M. Krishna and A. D. Singh, "Modeling correlated transient failures in faulttolerant systems," inProc. 19th Int. Symp. FaultTolerant Comput., June 1989, pp: 374381.
[16] N. P. Kronenberg, H. M. Levy, and W. D. Strecker, "VAXcluster: A closelycoupled distributed system,"ACM Trans. Comput. Syst., vol. 4., no. 2, pp. 130146, May 1986.
[17] I. Lee, R. K. Iyer, and D. Tang "Error/failure analysis using event logs from fault tolerant systems," inProc. 21st Int. Symp. FaultTolerant Comput., June 1991, pp. 1017.
[18] J. F. Meyer and L. Wei, "Analysis of workload influence on dependability," inProc. 18th IEEE Int. Symp. on Fault Tolerant Computing (FTCS18)(Tokyo), June 1988, pp. 8489.
[19] R. A. Sahner and K. S. Trivedi, "Reliability modeling using SHARPE,"IEEE Trans. Reliability, vol. R36, pp. 186193, June 1987.
[20] D. Tang and R. K. Iyer, "Dependability measurement and modeling of a multicomputer system,"IEEE Trans. Comput., to be published.
[21] K. S. Trivedi,Probability and Statistics with Reliability, Queueing and Computer Science Applications. Englewood Cliffs, NJ: PrenticeHall, 1982.