Issue No.01 - January (1997 vol.46)
pp: 60-74
<p><b>Abstract</b>—The paper presents the rationale for a functional simulation tool, called DEPEND, which provides an integrated design and fault injection environment for system level dependability analysis. The paper discusses the issues and problems of developing such a tool, and describes how DEPEND tackles them. Techniques developed to simulate realistic fault scenarios, reduce simulation time explosion, and handle the large fault model and component domain associated with system level analysis are presented. Examples are used to motivate and illustrate the benefits of this tool. To further illustrate its capabilities, DEPEND is used to simulate the Unix-based Tandem triple-modular-redundancy (TMR) based prototype fault-tolerant system and evaluate how well it handles near-coincident errors caused by correlated and latent faults. Issues such as memory scrubbing, re-integration policies, and workload dependent repair times, which affect how the system handles near-coincident errors, are also evaluated. Unlike any other simulation-based dependability studies, the accuracy of the simulation model is validated by comparing the results of the simulations with measurements obtained from fault injection experiments conducted on a production Tandem machine.</p>
Simulation, fault injection, dependability analysis, correlated errors, latent errors, intercomponent dependence, object-oriented design, Tandem TMR-based prototype analysis, validation.
Kumar K. Goswami, Ravishankar K. Iyer, Luke Young, "DEPEND: A Simulation-Based Environment for System Level Dependability Analysis", IEEE Transactions on Computers, vol.46, no. 1, pp. 60-74, January 1997, doi:10.1109/12.559803
