Large and complex information systems permeate society's infrastructure. Vulnerable to attack and failure, these systems are built upon fragile embedded hardware and software. The research and the programs described here attempt to address these issues.
Information systems are now integrated into the fabric of every complex system that supports advanced society. They permeate society's infrastructures: Telecommunications, the electric grid, banking and financial services, manufacturing, surface transportation, petroleum delivery, and emergency services provide just a few examples. Over the past few decades, such systems have become increasingly automated. Their ability to operate now depends on ubiquitous, embedded information subsystems, both large and small.
Our information-intensive infrastructures suffer from two new vulnerabilities. First, they can be attacked by hackers, ideologues, terrorists, and organized crime.
Second, systems have become fragile. Centrally administered infrastructure systems that use highly distributed computing systems have reduced costs, increased efficiency, and allowed massive reductions of field personnel. Consequently, the built-in fault tolerance and backup manual operating modes made possible by field personnel are now degraded or gone. New modes, functions, and services operate at electronic, not manual speeds. Cybercomponents may even act to advance damage when it occurs, reducing opportunity for manual reaction. Deregulation further complicates the problem. It moves control from government-regulated monopolies to multiple private corporations, hindering a nation's ability to arrest widespread disruption with a coordinated response.
We have coined a new phrase, critical infrastructure protection
, to describe a new concern. In 1996, US President Bill Clinton convened the President's Commission on Critical Infrastructure Protection. The Commission's purpose was to conduct a comprehensive review and recommend a national policy for protecting critical infrastructures and assuring their continued operation. In 1997, the Commission reported that "waiting for disaster is a dangerous strategy" 1
and called for immediate and comprehensive action.
Subsequently, Clinton issued Presidential Decision Directive 63 (PDD-63) in 1998. 2
This directive reflected many of the Commission's recommendations and established the Critical Infrastructure Assurance Office (CIAO) and the National Infrastructure Protection Center (NIPC) described in the accompanying sidebars. It also initiated agency-wide programs such as the DoD's Defense-Wide Infrastructure Assurance Program (DIAP).
We are also concerned about the survivability of information-only systems, even the survivability of your PC and mine. But the more significant problem lies with information systems integrated into the infrastructures that control a physical system's behavior.
Building survivable information-intensive systems is a deep concern for computer scientists and engineers. Although the problem derives from our past success, it painfully shows us how much further our research must progress. Our vulnerable systems arose from a lack of needed research in architectural design, computer security, reliability, and fault tolerance. To date, investment in such research has been too low.
Yet engineers have long worked to assure the survivability of complex systems—both before and after they became information intensive. Ignoring the embedded information subsystems, many survivability concerns arise from physics or the antics of Mother Nature. For example, failures arise from metal corrosion, wire insulation meltdowns, or unexpected temperature extremes. In these cases, deterioration is not typically instantaneous, but incremental as it moves through predictable states.
Information systems introduce two new and fundamentally hard problems. First, a one-bit change may not result in an incremental adaptation of the system, but may put it in a dramatically different state that can be viewed as discontinuous. Second, a thinking adversary, not physics or weather, may be involved. The adversary seeks weak links and exploits them. These two properties make building survivable information-intensive systems more difficult and a basic research problem.
The Presidential Commission called for a fourfold increase in the federal government's investment in research and development. While some recommendations—such as immediate changes in government processes and development of infrastructure systems patches—have been addressed, the agencies that should seek a true solution to this problem through basic research have regrettably not done so—despite this goal being a priority for the President. Because we are not progressing as fast as we could, it is highly likely that some portion of our society will needlessly incur grave consequences before researchers find and apply the fundamental solutions.
Furthermore, given the new global economy, critical infrastructure is not a specifically national issue—infrastructure issues have become transnational in character.
The critical challenge of assuring that we will have survivable systems led Computer's editors to dedicate this issue to the topic, presenting articles that approach the problem from quite different perspectives. Consequently, the solutions proposed for guaranteeing the survivability of systems, as well as for understanding and managing complexity through modeling and simulation, differ too.
Electric power grid
Massoud Amin's "Toward Self-Healing Infrastructure Systems" examines national infrastructures as complex interactive networks. He provides examples of the stresses being placed upon our power, telecommunications, transportation, and other infrastructures as well as potential external threats to them. For example, the August 1996 cascading power blackout in 11 US states and two Canadian provinces resulted from faults in a 500-kilovolt line in Oregon that generated excess load and tripped generators at McNary Dam. This, in turn, caused 500-milliwatt oscillations that separated the North-South-Pacific Intertie near the California-Oregon border. Experts now believe that shedding 0.4 percent of total load on the network for 30 minutes would have prevented the cascading effects, but load shedding is not typically a first-response option.
The Electric Power Research Institute (EPRI) is exploring ways of building "intelligence" into the electrical power infrastructure so that real-time analysis can be applied to the electric grid network to take proactive, self-stabilizing actions. Network components must mitigate and localize the effects of local failures and, if that fails, play an appropriate part in coordinated global reactions.
EPRI is developing intelligent electronic devices that combine sensors, computers, telecommunications units, and actuators to monitor and control the network, detect the system's true dynamics, predict a priori what kinds of problems may arise from specific failure types, take stock of options to mitigate prospective failures, and take action.
EPRI has played a leadership role in this area for some time, collaborating with the Department of Defense to fund long-term university research on complex interactive networks and systems. Such a partnership was arranged under the Government Industry Collaborative University Research project, and today DoD and EPRI equally fund a five-year, $30 million program, with funded university work already under way.
Users who run mission-critical applications on the Internet will require functionality unlikely to be available on the Next-Generation Internet. Some critical networked applications, glued together from component systems, exploit aspects of physical or logical separation. Although each application was designed independently, when integrated these systems lose that separation. Potentially, some assumptions about physical or logical isolation no longer hold, compromising rationales about safety properties. We thus need some notion of isolation that can be relied on for applications in shared network settings.
In his "The Next-Generation Internet: Unsafe at Any Speed?" Kenneth P. Birman introduces two case studies to illustrate the problem: intensive-care computing and integrated modular avionics. He proposes a new networking isolation capability, a virtual overlay network, which offers a basis for assuring reliability and security properties of critical applications. Birman points out that virtual overlay networks would be prohibitively costly to implement using contemporary technologies. He argues, however, that they can be supported at low cost and with good scalability, simply by extending an existing router feature and coupling it with well-understood group communication techniques.
Survivable information storage
As society becomes increasingly reliant upon digital information, information system users must be able to store their critical information with utmost confidence. They must be assured of their stored information's integrity, confidentiality, and continuous availability. For a storage system to be viewed as survivable, it must provide these guarantees over time. This unavoidably requires the replication and distribution of data and services across many storage nodes, because some subset of them may ultimately be (maliciously) compromised.
To meet this need, Jay J. Wylie and colleagues propose their PASIS architecture, described in "Survivable Information Storage Systems." Starting with the assumption that no individual service, node, or user can be fully trusted, they combine extant technologies with techniques for decentralizing storage services using threshold schemes, redundantly partitioning data, and performing dynamic self-maintenance.
Going after the bad guys
Donald E. Brown and colleagues' "Interactive Analysis of Computer Crimes" takes a different tack from the preceding articles. It describes the application of data analysis methods to link and associate computer crimes and to develop profiles of computer criminals based on incident data. More important, they extend this data analysis methodology with multiagent models that predict the behavior of computer criminals. While the databases they work with today relate mainly to conventional crime, it is appropriate to begin to develop the tools and techniques to profile computer criminal behavior and to simulate possible attacks that relate suspicious events across time and space.
Learning how to build survivable systems is laudable. But it is prudent to develop the techniques to detect and take action against those who attack our systems.
Some people question whether real problems can or even should drive basic research. The challenge of building survivable information-intensive systems is an excellent example of a very real problem whose only solution is research. By Presidential directive, every US Government agency has a highly visible team working on its particular instance of the problem. But the ultimate solution is not patches, filters, watchdog monitors, replicated systems, manual procedures, more security classification, and disconnection from networks. It lies in finding fundamentally new system architectures; identifying new and better approaches to assure security, reliability, and fault tolerance; and inventing new ways to make systems self-aware in a way that they have not been in the past. Although the articles in this issue make a contribution, assuring the survivability of information-intensive systems remains a worthy research challenge.
is the Quarles Professor of Engineering and Applied Science in the Computer Science Department at the University of Virginia. She served as the director of Defense Research and Engineering at the Department of Defense from 1993 to 1997 and is currently vice chair of the National Science Board.Contact her at firstname.lastname@example.org.