Parallel and Distributed Processing Symposium, International (2008)
Miami, FL, USA
Apr. 14, 2008 to Apr. 18, 2008
Jing Yu , Department of Computer Science University of Illinois at Urbana-Champaign 201 N.Goodwin ave, 61820, USA
Maria Jesus Garzaran , Department of Computer Science University of Illinois at Urbana-Champaign 201 N.Goodwin ave, 61820, USA
Marc Snir , Department of Computer Science University of Illinois at Urbana-Champaign 201 N.Goodwin ave, 61820, USA
Dramatic increases in the number of transistors that can be integrated on a chip make processors more susceptible to radiation-induced transient errors. For commodity chips which are cost- and energy-constrained, software approaches can play a major role for fault detection because they can be tailored to fit different requirements of reliability and performance. However, software approaches add a significant performance overhead because they replicate the instructions and add checking instructions to compare the results. In order to make software checking approaches more attractive, we use compiler techniqes to identify the “unnecessary” replicas and checking instructions. In this paper, we present three techniques. The first technique uses boolean logic to identify code patterns that correspond to outcome tolerant branches. The second technique identifies address checks before loads and stores that can be removed with different degrees of fault coverage. The third technique identifies the checking instructions and shadow registers that are unnecessary when the register file is protected in hardware. By combining the three techniques, the overheads of software approaches can be reduced by an average 50%.
M. Snir, M. J. Garzaran and J. Yu, "Efficient software checking for fault tolerance," 2008 IEEE International Parallel & Distributed Processing Symposium(IPDPS), Miami, FL, 2008, pp. 1-5.