, Carnegie Mellon University
Pages: pp. 14-18
Designing automotive computer systems involves satisfying stringent cost requirements and providing highly reliable operation in adverse environments. Despite this cost/performance challenge, electronic and computer-based features continue proliferating in vehicles, leading to widespread use of embedded real-time control networks. 1 In the next few years, networked control will reach such a sophistication level that using additional performance and safety features will require giving up mechanical linkages between the driver and the vehicle. Networked computers will control vehicle operation just as they do today in fly-by-wire aircraft designs.
The move to vehicles with throttle-by-wire, brake-by-wire, and steer-by-wire control systems (generically called X-by-wire) has already started and may be complete within this decade. Among the anticipated benefits: better fuel economy, better vehicle performance in adverse conditions, and advances in safety features such as collision warning and even automatic collision avoidance systems. 2 One of the biggest questions remains: Which communication protocol should we entrust to determine the timing and content of messages to be sent on wires (or eventually optical fibers) that will control safety-critical vehicle operations?
Achieving safe operation of a distributed computer system with full authority over vehicle operations is no small task. Critical to X-by-wire designs are the safety components of software, hardware, and in particular, computer networks that coordinate vehicle functions. Additionally, designing, deploying, maintaining, and operating such systems require a significantly more rigorous approach than now used for electronic convenience features in cars, and they must take place on an unprecedented scale of deployment compared to the traditional safety-critical areas of aerospace, medical, and nuclear power applications.
This issue of IEEE Micro focuses on the needs of and approaches to X-by-wire automotive networks. Clearly, these networks must continue to safely function during component failures, difficult conditions, and minor vehicle damage. What is less clear is the best way to build such a system, as reflected by the competition among different network protocol standards for dominance in the automotive market. Market pressure will force convergence to only one X-by-wire standard. Any resulting tradeoffs must produce safe but affordable systems. The articles in this special issue address network protocol alternatives and different methodologies for X-by-wire design.
Messages sent on an embedded control network must meet tight real-time deadlines, making the method a protocol uses to allocate network access to transmitters central to system design. Primarily, two network arbitration methods compete for use in X-by-wire applications. The first resolves message transmission time conflicts at runtime. The second, based on time-division multiplexing, determines message schedules entirely at design time. The " FlexRay" sidebar discusses a hybrid alternative.
One approach sends the highest-queued priority message next, regardless of the transmitter making the attempt. The widely deployed controller area network (CAN) protocol uses this approach successfully in a variety of applications—vehicles and elsewhere. This approach lets designers map task-scheduling problems into powerful, well-understood models such as rate monotonic scheduling (or the closely related deadline monotonic scheduling), in which the most frequent tasks receive the highest priority. This guarantees schedulability during fault-free operation without the need for a fixed schedule—making it especially suitable for event-based system designs. This feature provides flexibility and graceful degradation during overloads—from potential design errors or exceptional operating conditions—and permits flexibility in sending different mixes of messages depending on operating modes. The drawback to priority-based approaches is that determining the worst-case transmission time of a low-priority message can be complex, complicating the design process. Nevertheless, designers can determine and place an upper bound on the worst-case time for any given set of message traffic, given a reasonable set of assumptions.
A competing approach uses time-division multiplexing to allocate each message a unique access time within a periodic transmission schedule. The time-triggered architecture (TTA) uses this approach. It eliminates the need for an explicit collision-resolution mechanism—each transmitter determines its turn to access the network by checking a time reference. Additionally, with timing entirely determined at design time, computation of worst-case timing of messages becomes simple. Reduced flexibility in system design is the main disadvantage of this approach—designers must schedule messages periodically or map them into a periodic transmission framework. TTA has a particularly strong theoretical foundation, based on an ability to withstand an arbitrary fault at runtime. See the " Theoretical underpinnings" sidebar for more on this subject.
Some common misperceptions about both time- and event-triggered approaches bear scrutiny. In many cases, claimed drawbacks for one approach have corresponding issues in the other. Handling messages with short deadlines is an example. A time-triggered approach must schedule frequent slots for such messages to guarantee meeting a short deadline. A priority-based system does not have to reserve a specific time slot; however, it still must reserve adequate bandwidth to ensure schedulability.
Another misperception has to do with worst-case performance of priority-based systems. A priority-based system could have a high-priority task hogging bandwidth, causing missed deadlines in lower priority tasks. This can only result from an error of some sort, however, since designers can fully understand worst-case behavior at design time. Time-triggered systems do not have this vulnerability but require careful analysis to ensure that aperiodic external events are mapped appropriately into periodic network messages. Design errors are possible in both priority- and time-based systems, but just manifest themselves in different ways. Additionally, both approaches can effectively employ mechanisms, such as bus-guardians, to prevent transmitters from consuming excessive bandwidth.
Time- and priority-based approaches differ in complexity in terms of when message scheduling happens. Time-triggered approaches deal with this complexity at design time, requiring a schedule for each operating mode to ensure that transmitters never transmit at the same time. With a prioritized approach, the complexity shifts to runtime arbitration among competing transmitters. A subtle point to take into account is that using a prioritized protocol does not require the system designer to actually depend upon priorities to determine message transmission times. For example, messages sent on a CAN system can be spaced out by transmitters, preventing collisions during normal operation. This effectively converts the system into a time-triggered design that has a priority-based safety net in place in case of unexpected message overloads. Of course, TTA provides other services beyond arbitration that yield fault-tolerance benefits, but CAN does not force designers to take an event-triggered approach. Similarly, TTA-based designs can allocate time for event-based messages.
The perspective should shift from debating the inherent superiority of various protocols and their approaches to realizing that each simply presents different tradeoffs—both in protocol capabilities and in the set of assumptions used in system design. Assuming that a protocol functions as promised, we must next consider system-level architecture and tradeoff decisions. These tradeoffs hinge on assumptions about system properties, but debate continues on the ideal set of system properties. For example, many protocols provide a high-quality time reference value for coordinating distributed actions, but even that can be done in several different ways. This issue's articles discuss not only protocols but also elements of system design with a specific emphasis on failure mode analysis and fault-tolerant design important in X-by-wire systems.
"Design and Analysis of a Robust Real-Time Engine Control Network" by Michael Ellims, Stephen Parker, and James Zurlo presents a case study of the design steps for an engine control system that uses networked messages. These steps include analyzing hazards, defining messages to be sent on a CAN network, and analyzing worst-case events for a prioritized network message workload. The authors describe a process representative of the best practices in commercial applications today.
"CAN for Critical Embedded Automotive Networks" by Lars-Berno Fredriksson analyzes possibilities for applying the CAN protocol to an X-by-wire system. In particular, the author discusses methods for layering a time-triggered approach on top of CAN and the resulting tradeoffs and issues.
In "Time-Triggered Architecture: A Consistent Computing Platform," Reinhard Maier, Günther Bauer, Georg Stöger, and Stefan Poledna present the approach used with TTA. The authors discuss an architecture, rather than just a protocol, reflecting the holistic system design philosophy employed in this approach. Integral elements of this approach include end-to-end system scheduling, replica determinism for fault tolerance, and tool-based generation of static schedules. Accounting for all aspects of system design in a tightly coupled manner permits significant system optimization and increased efficiency while providing fault-tolerant, real-time network services.
"The FTT-CAN Protocol for Flexibility in Safety-Critical Systems" by Joaquim Ferreira, Paulo Pedreiras, Luís Almeida, and José A. Fonseca proposes a way to schedule both time- and event-triggered messages on CAN. The general approach breaks time up into cycles and divides each cycle's period into an asynchronous and synchronous window to avoid interference between the two message types.
In the final article, Roman Nossal and Roland Lang present "Model-Based System Development—An Approach to Building X-by-Wire Applications." This approach uses models as the input to a synthesized system design based on a proposed "A" model for top-down system synthesis that complements the well-known "V" system design cycle model.
Beyond the issues discussed in these articles, further system-level issues require solutions before manufacturers can successfully deploy X-by-wire systems. In some cases, the difficulties involve logistics issues or scaling issues rather than research issues, but all must be addressed to produce safe, cost-effective X-by-wire systems.
The methods currently used by the fault-tolerant computing community, mostly in relatively low-volume applications such as aircraft, must adjust to new demands. Some of the underlying assumptions change when dealing with the automotive market, such as the training level of equipment maintainers, and even the effectiveness of regulatory agencies to monitor and enforce policies across a fleet of hundreds of millions of cars.
Tool-based design methods help designers exploit sophisticated technology. The tools themselves, however, must be trusted to produce defect-free outputs—in effect making the tool chain part of the safety-critical system design. Ensuring that a scheduling tool does not have bugs that produce incorrect schedules requires a much higher level of software tool quality and, possibly, certification than is commonly seen in current design tools.
X-by-wire systems require much more sophisticated service tools than mechanical systems. X-by-wire will take electronic-based test equipment, already commonplace in repair shops, to a new level. Visual inspections of safety-critical systems (for example, spotting cracks or fluid leaks) won't suffice. A timing-jitter problem of a few microseconds in a network schedule might prove more difficult to detect than dripping hydraulic fluid unless specialized tools are available. Electronic diagnostics must ensure reliable and timely message delivery and that spare parts from warehouses (potentially from third-party vendors) are interoperable and acceptable for safe operation.
It seems inevitable that critical-control networks will eventually connect to the Internet, either directly or with an indirect connection via the vehicle's infotainment system. Reasons to link an X-by-wire network and the Internet include real-time system diagnosis, automated software defect correction and upgrades, and road condition monitoring. This connection, however, creates the possibility of terrorists or others gaining control of the system to disrupt vehicle operation or even induce accidents. The security models of vehicle system architectures currently tend to assume that embedded control networks do not need internal authentication but, rather, rely on a firewall. The validity of this assumption has yet to withstand the test of time.
Software-intensive desktop computer systems have acquired a reputation for having nonrobust, or even just plain buggy, software. What reason is there to assume that vehicle control software will be exempt from the same forces that make all complex software difficult to get right? A shift to X-by-wire gives more control to software in safety-critical operations than ever before. This results in a dramatically increased need to validate and verify the reliability of this software. Associated legal and regulatory issues will surely come into play as well.
While X-by-wire technology holds great promise, researchers and industry continue to work out the details of implementation, standardization, and deployment, even as the technology is implemented. We are fortunate to be able to present articles that describe the state of the art in this area at a time when the automotive industry is in the middle of its protocol selection process and about to transition to full-scale production of such systems.
The FlexRay ( http://www.flexray.com) protocol is another standard contender for X-by-wire designs. FlexRay is a hybrid protocol that allocates portions of network time to both a time-triggered protocol and to prioritized message access. While the CAN prioritization scheme is based on dominant and recessive bit-values, FlexRay uses timing offset values proportional to priority. As its name suggests, FlexRay adds flexibility—permitting coexistence of both prioritized and time-triggered messages on the same network. At a high level, that approach has similarities to the train communication network (TCN) described in a previous IEEE Micro issue, 1 as well as FTT-CAN.ReferenceH. KirrmannandP.A. Zuber"The IEC/IEEE Train Communication Network,"IEEE Micro, vol. 21, no. 2,Mar./Apr.2001,pp. 81-92.
A central philosophical difference among protocol approaches is the degree to which they recast things into a framework supported by fault-tolerant-computing theoretical underpinnings. Rushby illustrates this by comparing safety-critical protocols in terms of designing systems that can tolerate so-called Byzantine, arbitrary faults using distributed consensus models. 1 Because Byzantine-type failures are the most general class of failure under a certain set of assumptions, many believe that any system that can tolerate Byzantine failures can tolerate any other failure, whether considered by the designer or not. The TTA approach is fundamentally based on this point of view. Proponents of other approaches, however, argue that the strict replica determinism requirements are restrictive and adopt less stringent fault models in exchange for increased design flexibility, especially in prioritized messaging and scheduling. If other protocols want to be able to tolerate Byzantine failures, they must provide that capability in software above the protocol layer. If they do not tolerate Byzantine failures, then they must demonstrate they can tolerate failures relevant to safe operation of a vehicle.
While the Byzantine failure approach does cover arbitrary runtime anomalies, neither it, nor any of the other approaches discussed in these articles, covers software design defects in which participating nodes run copies of the same program having a shared implementation or specification defect. Thus, designers of these systems must get their software right. Whichever protocol eventually becomes the standard for X-by-wire, it must use an appropriate fault model and ensure proper message delivery to safety-critical systems.ReferenceJ. Rushby"Bus Architectures for Safety-Critical Embedded Systems,"Proc. 1st Workshop Embedded Software(EMSOFT), Lecture Notes in Computer Science, vol. 2211, Springer-Verlag,Heidelberg, Germany,2001,pp. 306-323.
Special thanks are due to the anonymous reviewers of this issue who provided timely, detailed, and very helpful comments to the authors.