Fault tolerance, being one of the four means for guaranteeing dependability, is intended to ensure the delivery of the correct services in the presence of active faults. It is implemented by error detection and subsequent system recovery. Error detection finds an erroneous system state. Following system recovery transforms the system state that contains one or more errors and (possibly) faults into a state without detected errors and faults (fault handling). Exceptions and exception handling provide a general framework for structuring the fault tolerance activities in a system, by focusing on the concept of exceptional/abnormal behaviour (as opposed to normal behaviour), exception handling enables specifying actions to be undertaken in the presence of abnormal events.
Citation:
H. Muccini, P. Pelliccione, A. Romanovsky, "Architecting Fault Tolerant Systems," wicsa, pp.43, Sixth Working IEEE/IFIP Conference on Software Architecture (WICSA'07), 2007