This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Performance Analysis of a Generalized Concurrent Error Detection Procedure
January 1990 (vol. 39 no. 1)
pp. 47-62

A general procedure for error detection in complex systems, called the data block capture and analysis monitoring process, is described and analyzed. It is assumed that, in addition to being exposed to potential external fault sources, a complex system will in general always contain embedded hardware and software fault mechanisms which can cause the system to perform incorrect computations and/or produce incorrect output. Thus, in operation, the system continuously moves back and forth between error and no-error states. These external fault sources or internal fault mechanisms are extremely difficult to detect. The data block capture and analysis monitoring process is concerned with detecting deviations from the normal performance of the system, known as errors, which are symptomatic of fault conditions. The process consists of repeatedly recording a fixed amount of data from a set of predetermined observation lines of the system being monitored (i.e. capturing a block of data) and then analyzing the captured block in an attempt to determine whether the system is functioning correctly.

[1] A. Avizienis, "Fault tolerance by means of external monitoring of computer systems," inNat. Comput. Conf. Proc., AFIPS Press, 1981, pp. 27-40.
[2] L. F. Pau,Failure Diagnosis and Performance Monitoring. New York: Marcel Dekker, 1981, pp. 5-7.
[3] M. E. Schmid, R. L. Trapp, A. E. Davidoff, and G. M. Masson, "Upset exposure by means of abstraction verification," inProc. 12th Fault-Tolerant Comput. Symp., IEEE Computer Society, June 1982, pp. 237-244.
[4] M. A. Schuette, J. P. Shen, D. P. Siewiorek, and Y. X. Zhu, "Experimental evaluation of two concurrent error detection schemes," inProc. 16th Fault-Tolerant Comput. Symp., IEEE Computer Soc., 1986, pp. 138-143.
[5] D. P. Siewiorek and R. S. Swarz,The Theory and Practice of Reliable System Design. Bedford, MA: Digital, 1982, pp. 246-255.
[6] J. H. Lala and A. L. Hopkins, "Survival and dispatch probability models for FTMP computer," inProc. 8th Fault-Tolerant Comput. Symp., IEEE Computer Soc., June 1978, pp. 37-43.
[7] D. Miller, "Reliability calculation using randomization for Markovian fault-tolerant computing systems," inProc. 13th Fault-Tolerant Comput. Symp., IEEE Computer Soc., 1983, pp. 284-289.
[8] H. M. Taylor and S. Karlin,An Introduction to Stochastic Modeling. Orlando, FL: Academic, 1984.
[9] E. Cinlar,Introduction to Stochastic Processes. Englewood Cliffs, NJ: Prentice-Hall, 1975.
[10] W. K. Fuchs, "A specification-based approach to concurrent structure verification in multiprocessor systems," inProc. Int. Conf. Comput. Design, IEEE Computer Soc., Oct. 1986, pp. 375-378.
[11] J. B. Eifert and J. P. Shen, "Processor monitoring using asynchronous signatured instruction streams," inProc. 14th Fault-Tolerant Comput. Symp., IEEE Computer Soc., 1984, pp. 394-399.
[12] L. F. Pau, "Applications of pattern recognition to the diagnosis of equipment failures," inPattern Recognition J., vol. 6, no. 3, pp. 3-11, Aug. 1974.
[13] R. A. Maxion, "Distributed diagnostic performance reporting and analysis," inProc. Int. Conf. Comput. Design, IEEE Computer Soc., Oct. 1986, pp. 362-365.
[14] T.-T. Y. Lin and D. P. Siewiorek, "Towards on-line diagnosis and trend analysis," inProc. Int. Conf. Comput. Design, IEEE Computer Soc., Oct. 1986, pp. 370-374.
[15] G. J. Montgomery, Conference Introduction ofProc. Air Force Workshop Artif. Intell. Appl. Integrated Diagnostics, Center for Applied Artificial Intelligence, University of Colorado at Boulder, CO, July 1986.
[16] D.M. Blough, "Fault detection and diagnosis in multiprocessor systems," Ph.D. dissertation, The Johns Hopkins Univ., Baltimore, MD, 1988.

Index Terms:
generalized concurrent error detection procedure; data block capture; analysis monitoring process; external fault sources; fault mechanisms; error detection; fault tolerant computing; performance evaluation.
Citation:
D.M. Blough, G.M. Masson, "Performance Analysis of a Generalized Concurrent Error Detection Procedure," IEEE Transactions on Computers, vol. 39, no. 1, pp. 47-62, Jan. 1990, doi:10.1109/12.46280
Usage of this product signifies your acceptance of the Terms of Use.