2013 Euromicro Conference on Digital System Design (2013)
Los Alamitos, CA, USA USA
Sept. 4, 2013 to Sept. 6, 2013
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/DSD.2013.62
In this paper, we propose a polymorphic fault tolerant architecture that can be tailored to efficiently support the reliability needs of multiple applications at run-time. Today, coarse-grained reconfigurable architectures (CGRAs) host multiple applications with potentially different reliability needs. Providing platform-wide worst-case (maximum) protection to all the applications is neither optimal nor desirable. To reduce the fault-tolerance overhead, adaptive fault-tolerance strategies have been proposed. The proposed techniques access the reliability requirements of each application and adjust the fault-tolerance intensity (and hence overhead), accordingly. However, existing flexible reliability schemes only allow to shift between different levels of modular redundancy (duplication, triplication, etc.) and deal with only a single class of faults (e.g. soft errors). To complement these strategies, we propose energy-aware fault-tolerance that, in addition to modular redundancy, can also provide low cost, sub-modular (e.g. residue mod 3) redundancy, to cater both permanent and temporary faults. Our solution relies on an agent based control layer and a configurable fault-tolerance data path. The control layer identifies the application class and configures the data path to provide the needed reliability. Simulation results using a few selected algorithms (FFT, matrix multiplication, and FIR filter) showed that the proposed method provides flexible protection with energy overhead ranging from 3.125% to 107% for different reliability levels. Synthesis results have confirmed that the proposed architecture significantly reduces the area overhead for self-checking (59.1%) and fault tolerant (7.1%) versions, compared to the state of the art adaptive reliability techniques.
Fault tolerant systems, Computer architecture, Redundancy, Digital signal processing, Circuit faults
Syed M.A.H. Jafri, Stanislaw J. Piestrak, Kolin Paul, Ahmed Hemani, Juha Plosila, Hannu Tenhunen, "Energy-Aware Fault-Tolerant CGRAs Addressing Application with Different Reliability Needs", 2013 Euromicro Conference on Digital System Design, vol. 00, no. , pp. 525-534, 2013, doi:10.1109/DSD.2013.62