The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.04 - October-December (2010 vol.7)
pp: 439-445
P. Bernardi , Politecnico di Torino, Torino
L.M. Bolzani Poehls , Pontificia Universidade Catolica do Rio Grande do Sul, Porto Alegre
M. Grosso , Politecnico di Torino, Torino
M. Sonza Reorda , Politecnico di Torino, Torino
ABSTRACT
Critical applications based on Systems-on-Chip (SoCs) require suitable techniques that are able to ensure a sufficient level of reliability. Several techniques have been proposed to improve fault detection and correction capabilities of faults affecting SoCs. This paper proposes a hybrid approach able to detect and correct the effects of transient faults in SoC data memories and caches. The proposed solution combines some software modifications, which are easy to automate, with the introduction of a hardware module, which is independent of the specific application. The method is particularly suitable to fit in a typical SoC design flow and is shown to achieve a better trade-off between the achieved results and the required costs than corresponding purely hardware or software techniques. In fact, the proposed approach offers the same fault-detection and -correction capabilities as a purely software-based approach, while it introduces nearly the same low memory and performance overhead of a purely hardware-based one.
INDEX TERMS
Fault tolerance, SoCs, transient faults, online test.
CITATION
P. Bernardi, L.M. Bolzani Poehls, M. Grosso, M. Sonza Reorda, "A Hybrid Approach for Detection and Correction of Transient Faults in SoCs", IEEE Transactions on Dependable and Secure Computing, vol.7, no. 4, pp. 439-445, October-December 2010, doi:10.1109/TDSC.2010.33
REFERENCES
[1] A. Mahmood and E.J. McCluskey, "Concurrent Error Detection Using Watchdog Processors—A Survey," IEEE Trans. Computers, vol. 37, no. 2, pp. 160-174, Feb. 1988.
[2] A. Mahmood, D.J. Lu, and E.J. McCluskey, "Concurrent Fault Detection Using a Watchdog Processor and Assertions," Proc. IEEE Int'l Test Conf., pp. 622-628, 1983.
[3] M.A. Schuette and J.P. Shen, "Processor Control Flow Monitoring Using Signatured Instruction Streams," IEEE Trans. Computers, vol. 36, no. 3, pp. 264-276, Mar. 1987.
[4] O. Goloubeva et al., "Soft-Error Detection Using Control Flow Assertions," Proc. IEEE Symp. Defect and Fault Tolerance in VLSI Systems, pp. 581-588, 2003.
[5] N. Oh, S. Mitra, and E.J. McCluskey, "ED4I: Error Detection by Diverse Data and Duplicated Instructions," IEEE Trans. Computers, vol. 51, no. 2, pp. 180-199, Feb. 2002.
[6] D. Pradhan, Fault-Tolerant Computer System Design. Prentice Hall, 1996.
[7] O. Gouloubeva et al., Software-Implemented Hardware Fault Tolerance. Springer Science + Business Media, p. 228. 2006.
[8] B. Randell, "System Structure for Software Fault Tolerance," IEEE Trans. Software Eng., vol. SE-1, no. 1, pp 220-232, June 1975.
[9] A. Avizienis, "The N-Version Approach to Fault-Tolerant Software," IEEE Trans. Software Eng., vol. SE-11, no. 12, pp. 1491-1501, Dec. 1985.
[10] K.H. Huang and J.A. Abraham, "Algorithm-Based Fault Tolerance for Matrix Operations," IEEE Trans. Computers, vol. 33, no. 6, pp. 518-528, June 1984.
[11] K.M. Chandy and C.V. Ramamoorthy, "Rollback and Recovery Strategies for Computer Programs," IEEE Trans. Computers, vol. 21, no. 6, pp. 546-556, June 1972.
[12] J. Long, W.K. Fuchs, and J.A. Abraham, "Compiler-Assisted Static Checkpoint Insertion," Proc. 22nd Int'l Symp. Fault-Tolerant Computing, pp. 58-65, 1992.
[13] L. Bolzani, P. Bernardi, and M. Sonza Reorda, "A Hybrid Approach to Fault Detection and Correction in SoCs," Proc. 13th IEEE Int'l On-Line Testing Symp., July 2007.
[14] M. Rebaudengo, M. Sonza Reorda, and M. Violante, "A New Approach to Software-Implemented Fault Tolerance," The J. Electronic Testing: Theory and Applications, vol. 20, pp. 433-437, Aug. 2004.
[15] Z. Alkhalifa et al., "Design and Evaluation of System-Level Checks for On-Line Control Flow Error Detecion," IEEE Trans. Parallel and Distributed Systems, vol. 10, no. 6, pp. 627-641, June 1999.
[16] N. Oh, P.P. Shirvani, and E.J. McCluskey, "Control-Flow Checking by Software Signatures," IEEE Trans. Reliability, vol. 51, no. 2, pp. 111-112, Mar. 2002.
[17] Y. Zorian, "What Is an Infrastructure IP?," IEEE Design and Test of Computers, vol. 19, no. 3, pp. 5-7, May/June 2002.
[18] http:/www.eembc.org, 2004.
[19] M. Namjoo, "CERBERUS-16: An Architecture for a General Purpose Watchdog Processor," Proc. IEEE Int'l Symp. Fault Tolerant Computing, pp. 216-219, 1983.
[20] K. Wilken and J.P. Shen, "Continuous Signature Monitoring: Low-Cost Concurrent Detection of Processors Control Errors," IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, vol. 9, no. 6, pp. 629-641, June 1990.
[21] P. Bernardi et al., "A New Hybrid Fault Detection Technique for System-on-a-Chip," IEEE Trans. Computers, vol. 55, no. 2, pp. 185-198, Feb. 2006.
[22] N.J. Wang and S.J. Patel, "Restore: Symptom Based Soft Error Detection in Microprocessors," Proc. IEEE Int'l Conf. Dependable Systems and Networks, pp. 30-39, 2005.
[23] J. Gaisler, "A Portable and Fault-Tolerant Microprocessor Based on the SPARC V8 Architecture," Proc. IEEE Int'l Conf. Dependable Systems and Networks, pp. 409-415, 2002.
[24] P. Civera et al., "An FPGA-Based Approach for Speeding-Up Fault Injection Campaigns on Safety-Critical Circuits," The J. Electronic Testing: Theory and Applications, vol. 18, no. 3, pp. 261-271, June 2002.
32 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool