This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Relyzer: Application Resiliency Analyzer for Transient Faults
May-June 2013 (vol. 33 no. 3)
pp. 58-66
Siva Kumar Sastry Hari, University of Illinois at Urbana-Champaign
Sarita V. Adve, University of Illinois at Urbana-Champaign
Helia Naeimi, Intel
Future microprocessors need low-cost solutions for reliable operation in the presence of failure-prone devices. A promising approach is to detect hardware faults by deploying low-cost software-level symptom monitors. However, there remains a nonnegligible risk that several faults might escape these detectors to produce silent data corruptions (SDCs). Evaluating and bounding SDCs is, therefore, crucial for low-cost resiliency solutions. The authors present Relyzer, an approach that can systematically analyze all application fault sites and identify virtually all SDC-causing program locations. Instead of performing fault injections on all possible application-level fault sites, which is impractical, Relyzer carefully picks a small subset. It employs novel fault-pruning techniques that reduce the number of fault sites by either predicting their outcomes or showing them equivalent to others. Results show that 99.78 percent of faults are pruned across 12 studied workloads, reducing the complete application resiliency evaluation time by 2 to 6 orders of magnitude. Relyzer, for the first time, achieves the capability to list virtually all SDC-vulnerable program locations, which is critical in designing low-cost application-centric resiliency solutions. Relyzer also opens new avenues of research in designing error-resilient programming models as well as even faster (and simpler) evaluation methodologies.
Index Terms:
Computer architecture,Microprocessors,Fault diagnosis,Hardware,Costs,Computer programs,computer architecture,low-cost hardware resiliency,silent data corruption,transient faults
Citation:
Siva Kumar Sastry Hari, Sarita V. Adve, Helia Naeimi, Pradeep Ramachandran, "Relyzer: Application Resiliency Analyzer for Transient Faults," IEEE Micro, vol. 33, no. 3, pp. 58-66, May-June 2013, doi:10.1109/MM.2013.30
Usage of this product signifies your acceptance of the Terms of Use.