Issue No. 04 - April (1997 vol. 30)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/2.585157
<p>Dependability evaluation involves the study of failures and errors. The destructive nature of a crash and long error latency make it difficult to identify the causes of failures in the operational environment. It is particularly hard to recreate a failure scenario for a large, complex system. </p> <p>To identify and understand potential failures, the authors use an experiment-based approach for studying system dependability. This approach is applied during the conception, design, prototype, and operational phases. </p> <p>To take an experiment-based approach, you must first understand a system's architecture, structure, and behavior. You need to know its tolerance for faults and failures, including its built-in detection and recovery mechanisms,and you need specific instruments and tools to inject faults, create failures or errors, and monitor their effects. </p> <p>Engineers most often use low-cost, simulation-based fault injection to evaluate the dependability of a system that is in the conceptual and design phases. At this point, the system under study is only a series of high-level abstractions; implementation details have yet to be determined. Thus the system is simulated on the basis of simplified assumptions. </p> <p>Simulation-based fault injection, which assumes that errors or failures occur according to predetermined distribution, is useful for evaluating the effectiveness of fault-tolerant mechanisms and a system's dependability; it does provide timely feedback to system engineers. However, it requires accurate input parameters, which are difficult to supply: Design and technology changes often complicate the use of past measurements. Testing a prototype, on the other hand, allows you to evaluate the system without any assumptions about system design. </p> <p>Instead of injecting faults, engineers can directly measure operational systems as they handle real workloads.Measurement-based analysis uses actual data, which contains much information about naturally occurring errors and failures and sometimes about recovery attempts. </p> <p>Although these three experimental methods have limitations, their unique values complement one another and allow for a wide spectrum of dependability studies. </p>
M. Hsueh, R. K. Iyer and T. K. Tsai, "Fault Injection Techniques and Tools," in Computer, vol. 30, no. , pp. 75-82, 1997.