This Article 
 Bibliographic References 
 Add to: 
COFTA: Hardware-Software Co-Synthesis of Heterogeneous Distributed Embedded Systems for Low Overhead Fault Tolerance
April 1999 (vol. 48 no. 4)
pp. 417-441

Abstract—Embedded systems employed in critical applications demand high reliability and availability in addition to high performance. Hardware-software co-synthesis of an embedded system is the process of partitioning, mapping, and scheduling its specification into hardware and software modules to meet performance, cost, reliability, and availability goals. In this paper, we address the problem of hardware-software co-synthesis of fault-tolerant real-time heterogeneous distributed embedded systems. Fault detection capability is imparted to the embedded system by adding assertion and duplicate-and-compare tasks to the task graph specification prior to co-synthesis. The dependability (reliability and availability) of the architecture is evaluated during co-synthesis. Our algorithm, called COFTA (Co-synthesis Of Fault-Tolerant Architectures), allows the user to specify multiple types of assertions for each task. It uses the assertion or combination of assertions which achieves the required fault coverage without incurring too much overhead. We propose new methods to: 1) Perform fault tolerance based task clustering, which determines the best placement of assertion and duplicate-and-compare tasks, 2) Derive the best error recovery topology using a small number of extra processing elements, 3) Exploit multidimensional assertions, and 4) Share assertions to reduce the fault tolerance overhead. Our algorithm can tackle multirate systems commonly found in multimedia applications. Application of the proposed algorithm to a large number of real-life telecom transport system examples (the largest example consisting of 2,172 tasks) shows its efficacy. For fault-secure architectures, which just have fault detection capabilities, COFTA is able to achieve up to 48.8 percent and 25.6 percent savings in embedded system cost over architectures employing duplication and task-based fault tolerance techniques, respectively. The average cost overhead of COFTA fault-secure architectures over simplex architectures is only 7.3 percent. In case of fault-tolerant architectures, which cannot only detect but also tolerate faults, COFTA is able to achieve up to 63.1 percent and 23.8 percent savings in embedded system cost over architectures employing triple-modular redundancy, and task-based fault tolerance techniques, respectively. The average cost overhead of COFTA fault-tolerant architectures over simplex architectures is only 55.4 percent.

[1] M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness.New York: W.H. Freeman, 1979.
[2] Y.-K. Kwok and I. Ahmad, “Dynamic Critical-Path Scheduling: An Effective Technique for Allocating Task Graphs to Multiprocessors,” IEEE Trans. Parallel and Distributed Systems, vol. 7, no. 5, pp. 506-521, May 1996.
[3] A. Kalavade and E.A. Lee,"A Hardware/Software Codesign Methodology for DSP Applications," IEEE Design&Test of Computers, Sept. 1993, pp. 16-28.
[4] R. Ernst, J. Henkel, and T. Benner, "Hardware Software Cosynthesis for Microcontrollers," IEEE Design&Test, Vol. 10, No. 4, Dec. 1993, pp. 64-75.
[5] F. Vahid, J. Cheng, and D.D. Gajski, "A Binary-Constraint Search Algorithm for Minimizing Hardware During Hardware/Software Partitioning," Proc. European Design Automation Conf., pp. 214-219, Sept. 1994.
[6] E. Barros, W. Rosenstiel, and X. Xiong, "A Method for Partitioning UNITY Language to Hardware and Software," Proc. European Design Automation Conf., pp. 220-225, Sept. 1994.
[7] A. Jantsch et al., "Hardware/Software Partitioning and Minimizing Memory Interface Traffic," Proc. European Design Automation Conf., pp. 226-231, Sept. 1994.
[8] R.K. Gupta, "Hardware-Software Cosynthesis of Digital Systems," PhD thesis, Stanford Univ., 1994.
[9] An excellent source, used here, for computer development from 1945 to the late 1990s is P. E. Ceruzzi, A History of Modern Computing, MIT Press, Cambridge, Mass., 1998. Ceruzzi's volume of pre-1945 "computing" history is also recommended: P.E. Ceruzzi, Reckoners: The Prehistory of the Digital Computer, from Relay to the Stored Program, 1935-1945, Greenwood Press, Westport, Conn., 1983.
[10] S. Prakash and A. Parker, "SOS: Synthesis of Application-Specific Heterogeneous Multiprocessor Systems," J. Parallel and Distributed Computing, vol. 16, pp. 338-351, Dec. 1992.
[11] J.G. D'Ambrosio and X. Hu, "Configuration-Level Hardware/Software Partitioning for Real-Time Embedded Systems," Proc. Int'l Workshop Hardware-Software Co-Design, 1994.
[12] T.-Y. Yen and W. Wolf, “Communication Synthesis for Distributed Embedded Systems,” Proc. IEEE Int'l Conf. Computer-Aided Design, 1995.
[13] J. Hou and W. Wolf, "Process Partitioning for Distributed Embedded Systems," Proc. Int'l Workshop Hardware/Software Codesign, pp. 70-76, Sept. 1996.
[14] D. Kirovski and M. Potkonjak, "System-Level Synthesis Of Low-Power Real-Time Systems," Proc. Design Automation Conf., pp. 697-702, June 1997.
[15] S. Srinivasan and N.K. Jha, "Hardware-Software Co-Synthesis of Fault-Tolerant Real-Time Distributed Embedded Systems," Proc. European Design Automation Conf., pp. 334-339, Sept. 1995.
[16] B.P. Dave, G. Lakshminarayana, and N.K. Jha, "COSYN: Hardware-Software Co-Synthesis of Embedded Systems," Proc. Design Automation Conf., pp. 703-708, June 1997.
[17] S. Yajnik, S. Srinivasan, and N.K. Jha, "TBFT: A Task Based Fault Tolerance Scheme for Distributed Systems," Proc. ISCA Int'l Conf. Parallel and Distributed Computing Systems, pp. 483-489, Oct. 1994.
[18] S.M. Shatz, J.P. Wang, and M. Goto, “Task Allocation for Maximizing Reliability of Distributed Computer Systems,” IEEE Trans. Computers, vol. 41, no. 9, pp. 1,156-1,168, Sept. 1992.
[19] S. Kartik and C.S.R. Murthy, “Improved Task Allocation Algorithms to Maximize Reliability of Redundant Distributed Computing Systems,” IEEE Trans. Reliability, vol. 44, pp. 575-586, Dec. 1995.
[20] A. Dasgupta and R. Karri, "Optimal Algorithms of Reliable Application-Specific Heterogeneous Multiprocessors," IEEE Trans. Reliability, vol. 44, no. 4, pp. 603-613, Dec. 1995.
[21] F. Distante and V. Piuri, "Hill-Climbing Heuristics for Optimal Hardware Dimensioning and Software Allocation in Fault-Tolerant Distributed Systems," IEEE Trans. Reliability, vol. 38, no. 1, pp. 28-39, Apr. 1989.
[22] B.P. Dave and N. Jha, "CASPER: Concurrent Hardware-Software Cosynthesis of Hard Real-Time Aperiodic and Periodic Specifications of System Architectures," Proc. DATE, IEEE CS Press, 1998, pp. 118-124.
[23] N.K. Jha and S. Kundu, Testing and Reliable Design of CMOS Circuits.Norwell, Mass.: Kluwer Academic, 1990.
[24] F. Wang, K. Ramamritham, and J.A. Stankovic, "Determining Redundancy Levels for Fault Tolerant Real-Time Systems," IEEE Trans. Computers, vol. 44, no. 2, pp. 292-301, Feb. 1995.
[25] E. Lawler and C. Martel, "Scheduling Periodically Occurring Tasks on Multiple Processors," Information Processing Letters, vol. 12, pp. 9-12, Feb. 1981.
[26] D.G. Corneil and C.C. Gotlieb, "An Efficient Algorithm for Graph Isomorphism," J. ACM, vol. 17, no. 1, pp. 51-64, Jan. 1970.
[27] C.M. Hoffman, Group-Theoretic Algorithms and Graph Isomorphism.Berlin: Springer-Verlag, 1982.
[28] N. Ravinchandran, Stochastic Methods in Reliability.New York: John Wiley&Sons, 1990.
[29] K.K. Aggarwal, Reliability Engineering.Dodrecht, The Netherlands: Kluwer Academic, 1993.
[30] J.D. Musa, A. Iannino, and K. Okumoto, Software Reliability: Measurement, Prediction, and Application. McGraw-Hill, 1990.
[31] Bellcore, "Generic Reliability Assurance Requirements for Fiber Optic Transport Systems," Technical Reference TR-NTW-00418, Dec. 1992.

Index Terms:
Allocation, distributed systems, embedded systems, hardware-software co-synthesis, scheduling, system synthesis.
Bharat P. Dave, Niraj K. Jha, "COFTA: Hardware-Software Co-Synthesis of Heterogeneous Distributed Embedded Systems for Low Overhead Fault Tolerance," IEEE Transactions on Computers, vol. 48, no. 4, pp. 417-441, April 1999, doi:10.1109/12.762534
Usage of this product signifies your acceptance of the Terms of Use.