May 1995 (Vol. 12, No. 3) pp. 1316
0740-7459/95/$31.00 © 1995 IEEE
Published by the IEEE Computer Society
Published by the IEEE Computer Society
Reliability and Safety of RealTimeSystems
PDFs Require Adobe Acrobat
Not a day goes by that the general public does not come intocontact with a realtime system. As their numbers andimportance grow, so do the implications for software developers.
Applications that call for realtime systems are particularlysusceptible to failures. Our challenge is to design, analyze,and build such systems to prevent failures or (at least) tomitigate their effect on operations. The theme articles inthis issue, summarized in the box ( "Article Summaries: Safety and Reliability" ), explore different ways to meet this challenge.
What differentiates the development of realtime systemssoftware from other applications? How do reliability and safetyfigure into their development?
A realtime system must respond to externally generated stimuliwithin a finite, specifiable time delay. Realtime systems aretypically embedded systems that interface directly to the physicalequipment they operate, monitor, and control. Examples of suchsystems are flight and weaponscontrol systems, airtraffic controlsystems, telecommunicationsswitching and network equipment,manufacturing processcontrol, and even speechrecognitionsystems, which are beginning to appear in telecommunicationsnetworks and personal computers.
Not a day passes that we don't come into contact with arealtime system. And now they are becoming more prevalent incritical applications. A failure in a critical application such as atelecommunications system may result in great financial loss; in aflightcontrol system it may result in loss of life.
More effort must be expended to analyze the reliability andsafety of such systems. Analysis of hardware components incritical applications has matured over the years and commonlyfollowed techniques have emerged. However, methods andtechniques for analyzing the reliability and safety of the softwarepart of critical applications are relatively new and still maturing.Yet the vulnerability of the system to software failures is on therise and may (and in some cases do) exceed hardware failures.
Software is not only becoming more prevalent in realtimesystems, it is becoming a larger part of realtime systems. Bylarger, we mean the amount of effort expended in designing andimplementing the software over total expended effort.
What differentiates realtime software development from othersoftware?
♦ First, its design is resourceconstrained. The primaryresource that is constrained is time. Depending on processorspeed, this time constraint equates to the number ofprocessing cycles required to complete the task. However,time is not the only resource that can be constrained. Theuse of main memory may also be constrained. In fact,designers may trade off reducing processing cycles at theexpense of using more memory.
♦ Second, realtime software is compact yet complex. Eventhough the entire system may have millions of lines of code,the timecritical part is but a small fraction of the totalsoftware. Yet these timecritical parts are highly complex, withmuch of the complexity introduced to conserve constrainedresources.
♦ Third, unlike other software systems, realtime systems donot have the luxury of failurerecovery mechanisms. Theremay not be a human around to help the software recoverfrom failure. Such software must detect when failure occurs,continue to operate despite the failure, contain the damage tosurrounding data and processes, and recover quickly so as tominimize operating problems.
In developing realtime software, more time is spent in design,especially analyzing ways to improve performance and enhancereliability and safety. Development of most other software focuseson how to handle a normal situation, but realtime,criticalapplication development also focuses on how to handlethe abnormal situation. Also, more time is spent in testingrealtime systems.
Testing not only removes faults (the underlying source ofsoftware failure) and hence improves its reliability, it alsodemonstrates reliability and safety by running software underfield conditions. For safetycritical software, additional time isspent in analyzing failure modes that contribute to safetyhazards.
Address questions about this issue to Everett at AT&T Bell Labs,Room 3D416, PO Box 638, Murrary Hill, NJ 079740636;w.w.everettatt.com or to Honiden at Systems and SoftwareEngineering Laboratory, Toshiba Corp., 70 Yanagicho, Saiwaiku,Kawasaki 210, Japan; honidenssel.toshiba.co.jp.