The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.02 - March/April (2012 vol.9)
pp: 159-172
Antonio Martínez-Álvarez , University of Alicante, San Vicente del Raspeig
Sergio A. Cuenca-Asensi , University of Alicante, San Vicente del Raspeig
Felipe Restrepo-Calle , University of Alicante, San Vicente del Raspeig
Francisco R. Palomo Pinto , University of Sevilla, Sevilla
Hipólito Guzmán-Miranda , University of Sevilla, Sevilla
Miguel A. Aguirre , University of Sevilla, Sevilla
ABSTRACT
The protection of processor-based systems to mitigate the harmful effect of transient faults (soft errors) is gaining importance as technology shrinks. At the same time, for large segments of embedded markets, parameters like cost and performance continue to be as important as reliability. This paper presents a compiler-based methodology for facilitating the design of fault-tolerant embedded systems. The methodology is supported by an infrastructure that permits to easily combine hardware/software soft errors mitigation techniques in order to best satisfy both usual design constraints and dependability requirements. It is based on a generic microprocessor architecture that facilitates the implementation of software-based techniques, providing a uniform isolated-from-target hardening core that allows the automatic generation of protected source code (hardened code). Two case studies are presented. In the first one, several software-based mitigation techniques are implemented and evaluated showing the flexibility of the infrastructure. In the second one, a customized fault tolerant embedded system is designed by combining selective protection on both hardware and software. Several trade-offs among performance, code size, reliability, and hardware costs have been explored. Results show the applicability of the approach. Among the developed software-based mitigation techniques, a novel selective version of the well known SWIFT-R is presented.
INDEX TERMS
Fault tolerance, reliability, soft error, single event upset—SEU, embedded systems design, hardware/software co-design, design space exploration.
CITATION
Antonio Martínez-Álvarez, Sergio A. Cuenca-Asensi, Felipe Restrepo-Calle, Francisco R. Palomo Pinto, Hipólito Guzmán-Miranda, Miguel A. Aguirre, "Compiler-Directed Soft Error Mitigation for Embedded Systems", IEEE Transactions on Dependable and Secure Computing, vol.9, no. 2, pp. 159-172, March/April 2012, doi:10.1109/TDSC.2011.54
REFERENCES
[1] R. Baumann, “Radiation-Induced Soft Errors in Advanced Semiconductor Technologies,” IEEE Trans. Device and Materials Reliability, vol. 5, no. 3, pp. 305-316, Sept. 2005.
[2] P. Shivakumar, M. Kistler, S.W. Keckler, D. Burger, and L. Alvisi, “Modeling the Effect of Technology Trends on the Soft Error Rate of Combinational Logic” Proc. Int'l Conf. Dependable Systems and Networks, pp. 389-398, 2002.
[3] T. Karnik, P. Hazucha, and J. Patel, “Characterization of Soft Errors Caused by Single Event Upsets in CMOS Processes,” IEEE Trans. Dependable and Secure Computing, vol. 1, no. 2, pp. 128-143, Apr.-Jun. 2004.
[4] R. Edwards, C. Dyer, and E. Normand, “Technical Standard for Atmospheric Radiation Single Event Effects (SEE) on Avionics Electronics,” Proc. IEEE Radiation Effects Data Workshop (REDW '04), pp. 1-5, 2004.
[5] R. Baumann, “Soft Errors in Commercial Semiconductor Technology: Overview and Scaling Trends,” IEEE 2002 Reliability Physics Symp. Tutorial Notes, Reliability Fundamentals, pp. 121-01.1–121-01.14, IEEE Press, Apr. 2002.
[6] S.E. Michalak, K.W. Harris, N.W. Hengartner, B.E. Takala, and S.A. Wender, “Predicting the Number of Fatal Soft Errors in Los Alamos National Laboratory's ASC Q Supercomputer,” IEEE Trans. Device and Materials Reliability, vol. 5, no. 3, pp. 329-335, Sept. 2005.
[7] ESA. “The Radiation Design Handbook ESA PSS-01-609,” technical report, European Space Agency, 1993.
[8] IEC. “IEC/TS 62396-1,” technical report, Int'l Electrotechnical Commission, Mar. 2006.
[9] DoD. “MIL-HDBK-817, Military Handbook System Develop Radiation Hardness Assurance,” technical report, Dept. of Defense. USA, 1994.
[10] J. Von-Neumann, “Probabilistic Logics and Synthesis of Reliable Organisms from Unreliable Components,” Automata Studies, C.E. Shannon and J. McCarthy, eds., pp. 43-98, Princeton Univ. 1956.
[11] R. Naseer, R.Z. Bhatti, and J. Draper, “Analysis of Soft Error Mitigation Techniques for Register Files in IBM Cu-08 90nm Technology,” Proc. 49th IEEE Int'l Midwest Symp. Circuits and Systems, pp. 515-519, Aug. 2006.
[12] T.M. Austin, “DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design,” Proc. 32nd Ann. Int'l Symp. Microarchitecture, (MICRO-32), pp. 196-207, Nov. 1999.
[13] A. Mahmood and E.J. McCluskey, “Concurrent Error-Detection Using Watchdog Processors,” IEEE Trans. Computers, vol. 37, no. 2, pp. 160-174, Feb. 1988.
[14] S.S. Mukherjee, M. Kontz, and S.K. Reinhardt, “Detailed Design and Evaluation of Redundant Multithreading Alternatives,” Proc. 29th Int'l Symp. Computer Architecture, pp. 99-110, 2002.
[15] M.A. Gomaa, C. Scarbrough, T.N. Vjaykumar, and I. Pomeranz, “Transient-Fault Recovery for Chip Multiprocessors,” IEEE Micro, vol. 23, no. 6, pp. 76-83, Nov.-Dec. 2003.
[16] P.K. Samudrala, J. Ramos, and S. Katkoori., “Selective Triple Modular Redundancy (STMR) Based Single-Event Upset (SEU) Tolerant Synthesis for FPGAS,” IEEE Trans. Nuclear Science, vol. 51, no. 5, pp. 2957-2969, Oct. 2004.
[17] A. Parashar, S. Gurumurthi, and A. Sivasubramaniam, “SlicK: Slice-Based Locality Exploitation for Efficient Redundant Multithreading,” ACM SIGPLAN NOTICES, vol. 41, no. 11, pp. 95-105, Nov. 2006.
[18] V.K. Reddy, S. Parthasarathy, and E. Rotenberg, “Understanding Prediction-Based Partial Redundant Threading for Low-Overhead, High-Coverage Fault Tolerance,” ACM SIGPLAN NOTICES, vol. 41, no. 11, pp. 83-94, Nov. 2006.
[19] O. Ergin, O.S. Unsal, X. Vera, and A. Gonzalez, “Reducing Soft Errors through Operand Width Aware Policies,” IEEE Trans. Dependable and Secure Computing, vol. 6, no. 3, pp. 217-230, July-Sept. 2009.
[20] N. Oh, S. Mitra, and E.J. McCluskey, “(EDI)-I-4: Error Detection by Diverse Data and Duplicated Instructions,” IEEE Trans. Computers, vol. 51, no. 2, pp. 180-199, Feb. 2002.
[21] M. Rebaudengo, M.S. Reorda, and M. Violante, “A New Software-Based Technique for Low-Cost Fault-Tolerant Application,” Proc. Ann. Reliability and Maintainability Symp., pp. 25-28, 2003.
[22] N. Oh, P.P. Shirvani, and E.J. McCluskey, “Error Detection by Duplicated Instructions in Super-Scalar Processors,” IEEE Trans. Reliability, vol. 51, no. 1, pp. 63-75, Mar. 2002.
[23] N. Oh, P.P. Shirvani, and E.J. McCluskey, “Control-Flow Checking by Software Signatures,” IEEE Trans. Reliability, vol. 51, no. 1, pp. 111-122, Mar. 2002.
[24] G.A. Reis, J. Chang, N. Vachharajani, R. Rangan, and D.I. August, “SWIFT: Software Implemented Fault Tolerance,” Proc. Int'l Symp. Code Gen. and Opt., pp. 243-254, 2005.
[25] M. Rebaudengo, M.S. Reorda, and M. Violante, “A New Approach to Software-Implemented Fault Tolerance,” J. Electronic Testing-Theory and Applications, vol. 20, no. 4, pp. 433-437, Aug. 2004.
[26] G.A. Reis, J. Chang, and D.I. August, “Automatic Instruction-Level Software-Only Recovery,” IEEE Micro, vol. 27, no. 1, pp. 36-47, 2007.
[27] M. Pignol, “COTS-Based Applications in Space Avionics. In EDDA, editor,” Proc. 13th Design, Automation and Test in Europe conf., (DATE '10), p. 1213, Mar. 2010.
[28] P. Bernardi, L.M. Bolzani Poehls, M. Grosso, and M.S. Reorda, “A Hybrid Approach for Detection and Correction of Transient Faults in SoCs,” IEEE Trans. Dependable and Secure Computing, vol. 7, no. 4, pp. 439-445, Oct.-Dec. 2010.
[29] G.A. Reis, J. Chang, N. Vachharajani, S.S. Mukherjee, R. Rangan, and D.I. August, “Design and Evaluation of Hybrid Fault-Detection Systems,” Proc. 32nd Int'l Symp. Computer Arch., pp. 148-159, June 2005.
[30] F. Restrepo-Calle, A. Martínez-Álvarez, S. Cuenca-Asensi, F.R. Palomo, and M.A. Aguirre, “Hardening Development Environment for Embedded Systems, 2010.” Proc. 2nd HiPEAC Workshop Design for Reliability (DFR '10) held with the Fifth Int'l Conf. High Performance and Embedded Arch. and Compilers, pp. 1-10, Jan. 2010.
[31] J. Napoles, H. Guzman, M. Aguirre, J. Tombs, F. Munoz, V. Baena, A. Torralba, and L. Franquelo, “Radiation Environment Emulation for VLSI Designs: A Low Cost Platform Based on Xilinx FPGAs,” Proc. IEEE Int'l Symp. Industrial Electronics, pp. 3334-3338, June 2007.
[32] F.L. Kastensmidt, L. Carro, and R. Reis, Fault-Tolerance Techniques for SRAM-Based FPGAs, Springer, p. 183, 2006.
[33] L. Sterpone and M. Violante, “A New Reliability-Oriented Place and Route Algorithm for SRAM-Based FPGAs,” IEEE Trans. Computers, vol. 55, no. 6, pp. 732-744, Jun. 2006.
[34] G. DeMicheli and R.K. Gupta, “Hardware/Software Co-Design,” Proc. IEEE, vol. 85, no. 3, pp. 349-365, Mar. 1997.
[35] M. Rebaudengo, M.S. Reorda, M. Violante, and M. Torchiano, “A Source-to-Source Compiler for Generating Dependable Software,” Proc. First IEEE Int'l Workshop Source Code Analysis and Manipulation, pp. 33-42, 2001.
[36] S.S. Mukherjee, C. Weaver, J. Emer, S.K. Reinhardt, and T. Austin, “A Systematic Methodology to Compute the Architectural Vulnerability Factors For a High-Performance Microprocessor,” Proc. 36th Int'l Symp. Microarchitecture, pp. 29-40, Dec. 2003.
[37] S.K. Reinhardt and S.S. Mukherjee, “Transient Fault Detection via Simultaneous Multithreading,” Proc. 27th Int'l Symp. Computer Arch., pp. 25-36, Jun. 2000.
[38] MISRA, MISRA-C:2004 Guidelines for the Use of the C Language in Critical Systems., Motor Ind. Softw. Reliability Assoc., 2004.
[39] J. Lee and A. Shrivastava, “Compiler-Managed Register File Protection for Energy-Efficient Soft Error Reduction,” Proc. Asia and South Pacific Design Automation Conf., pp. 618-623, 2009.
[40] V. Sridharan and D.R. Kaeli, “Quantifying Software Vulnerability,” Proc. Workshop Radiation Effects and Fault Tolerance in Nanometer Technologies(WREFTNT '08), pp. 323-328, 2008.
[41] H. Guzman-Miranda, M.A. Aguirre, and J. Tombs, “Noninvasive Fault Classification, Robustness and Recovery Time Measurement in Microprocessor-Type Architectures Subjected to Radiation-Induced Errors,” IEEE Trans. Instrumentation and Measurement, vol. 58, no. 5, pp. 1514-1524, May. 2009.
[42] M.A. Aguirre, J.N. Tombs, F. Munoz, V. Baena, H. Guzman, J. Napoles, A. Fernandez-Leon, F. Tortosa-Lopez, and D. Merodio, “Selective Protection Analysis using a SEU Emulator: Testing Protocol and Case Study over the LEON2 Processor,” IEEE Trans. Nuclear Science, vol. 54, no. 4,Part 2, pp. 951-956, Aug. 2007.
[43] K. Chapman, PicoBlaze KCPSM3. 8-bit Micro Controller for Spartan-3, Virtex-II and Virtex-II, Xilinx Ltd., Oct. 2003.
[44] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II,” IEEE Trans. Evolutionary Computation, vol. 6, no. 2, pp. 182-197, Apr. 2002.
6 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool