26th IEEE VLSI Test Symposium (vts 2008)
Algorithm Level Fault Tolerance: A Technique to Cope with Long Duration Transient Faults in Matrix Multiplication Algorithms
April 27-May 01
ISBN: 0-7695-3123-7
For technologies beyond the 45 nm node, radiation induced transients will last longer than one clock cycle. In this scenario, temporal redundancy techniques will no longer be able to cope with radiation induced soft errors, while spatial redundancy techniques still impose high power and area overheads. The solution to this impasse is the use of algorithm level techniques, able to detect and correct errors with low cost. In this paper, a new approach to deal with this problem is proposed, and applied to matrix multiplication algorithm. The proposed technique is compared to previously published fault tolerance techniques, and the costs of detection and recomputation for both approaches are compared and discussed.
Index Terms:
fault tolerance, radiation effects, long transients, recomputation granularity
Citation:
Carlos Arthur Lang Lisboa, Costas Argyrides, Dhiraj Kumar Pradhan, Luigi Carro, "Algorithm Level Fault Tolerance: A Technique to Cope with Long Duration Transient Faults in Matrix Multiplication Algorithms," vts, pp.363-370, 26th IEEE VLSI Test Symposium (vts 2008), 2008