The Community for Technology Leaders

Guest Editor's Introduction: The Image Understanding Program at ARPA

Oscar , Advanced Research Projects Agency

Pages: pp.8-10

This issue of IEEE Expert focuses on the Image Understanding program at the US Defense Department's Advanced Research Projects Agency. ARPA, established in 1958 partly in response to the launching of the initial Sputnik satellite, is charged with developing imaginative, innovative, and often risky research ideas—ideas that are expected to have technological consequences significantly beyond normal evolutionary development approaches.

Image understanding is the art, science, and technology of developing computer algorithms that use imaging sensors data to create descriptions of the world suitable for particular purposes. For autonomous vehicle navigation, a description might be an indication of road edges or obstacles for use by the vehicle's steering system. For an intelligence application, the description could indicate changes of military significance to a site for use by an intelligence image analyst.

IU analyzes 2D arrays of values from imaging sensors that are measurements of properties—such as intensity, range, or phase—of each element in a scene. Some sensors, such as television cameras, produce a time sequence of such arrays. The translation from an array of numbers to meaningful objects must overcome object occlusion, shadows, reflections, and other disturbances. Contextual information, such as knowledge of the domain being sensed, is often needed to aid this translation.

The ARPA IU program began with a workshop in May 1975 to determine whether IU research was sufficiently mature to justify an ARPA investment. Sixty people attended, and the resulting proceedings, a poorly mimeographed, 70-page collection of papers, emphasized photographic interpretation, cartography, real-time tracking, and imagery in remotely piloted vehicles. Over the past 20 years, the ARPA IU program has grown to be the major funder of IU research and development in the US. The last IU workshop in 1994 had an attendance of 600, and the proceedings 1 was a two-volume, 2,400-page document that appeared in CD-ROM as well as on line (

The program's long-term goal is to develop computational theories and techniques for use in artificial vision systems whose performance matches or exceeds that of humans, exploiting sensing throughout the breadth of the electromagnetic spectrum, in all environments. The shorter-term goals are to carry out applications-directed research on machine vision, provide a suitable IU software environment, and exploit IU capabilities. Figure 1 shows the transition from research to application in the ARPA IU program.


Figure 1.    The ARPA Image Understanding program, showing the transition from research to application.

IU research has addressed problems in the phenomenology of the sensing process, fusion of diverse sensor data, use of learning processes in IU, use of speech and natural language in aiding the interactive IU process, and incorporation of contextual knowledge and reasoning into the IU process. Current applications include:

Interactive target detection/recognition. Autonomous air vehicles provide a huge volume of tactical synthetic aperture radar imagery, straining current abilities to carry out timely ground-based analysis. Interactive IU systems will play a strong role in solving this problem: the IU algorithms cue the image analyst to suspected targets, and the analyst selects the most likely candidate targets for further IU examination.

Reconnaissance, surveillance, and target acquisition. RSTA techniques for an unmanned ground vehicle rely on advanced filtering techniques for target detection and model-based analysis for target identification. The analysis often requires fusion of information from sensors of different types—electro-optical, infrared, and laser.

Radius. Designed to improve the effectiveness of the intelligence image analyst by providing semiautomated and automated exploitation tools, the Radius project relies on the concept of a 2D or 3D site model that IU algorithms use for change detection, counting, and visualization.

Image understanding environment. This software environment for supporting IU research and development provides a platform for making IU algorithm design more effective and for sharing algorithms and data. The IUE will support various application scenarios, including photo interpretation, smart weapons, navigation, and industrial vision.

Vision-based cartographic model construction. Many current and future Defense Department programs require the generation and maintenance of accurate map data to support decision-making by humans or intelligent autonomous agents. Techniques of vision-based cartographic model construction can update and provide additional detail to map data, based on satellite and aerial photography.

Construction of simulation databases. The increasing importance of visualization in simulation systems for training and mission rehearsal creates a critical need for rapid construction of accurate, up-to-date, spatial databases of battlefields. Today's manual database-construction process is the bottleneck preventing the widespread adoption of simulation throughout the military. IU techniques can shorten the timeline required to build a simulation database and to improve the construction accuracy.

Semiautomated image annotation. In this project, an intelligent agent views an image, speaks into a microphone, and points to objects in the image. A speech-understanding system then converts the speech to natural language narrative; the an IU/NL system uses the narrative and the pointing data to outline and label objects in the image. The natural language narrative and the annotated image accompany the image as collateral in the database, and can be used for retrieval. This approach finds use in such diverse applications as the annotation of intelligence imagery and of medical radiographs.

The future

Digital imagery is proliferating in both the military and commercial sectors. New imaging sensors exploit all parts of the electromagnetic spectrum, and are deployed on every form of aircraft, ground vehicle, and stationary platform imaginable. Desktop scanners producing high-resolution digital images are commonplace. Video cameras are small and cheap, making them suitable in a limitless number of applications. Thus, the quantity and variety of digital imagery and its applications are exploding, which will surely fuel demands for automated interpretation processes and machine vision products not yet imagined. The ARPA IU Program intends to spur the development of new IU techniques to meet this demand.

The computational infrastructure to support the large-scale interpretation of digital imagery is already in place. The growth of computing power has made it feasible to consider delivering IU techniques on personal computers. Increased storage media and network capabilities make it feasible to manipulate large digital images and even video sequences on hardware already found in most workplaces.

As the quantity of imagery increases, so does the desire to automatically index, access, interpret, and understand that imagery. The challenge to the IU community is not so much in finding clever solutions to the established problems in computer vision, but in finding clever applications of computer-vision techniques in light of the explosion of digital imagery.


ARPA has been a key supporter of IU research and development for the past 20 years. We hope that this important presence can be maintained in the future.

I thank the authors for their efforts, IEEE Expert referees for their suggestions and comments, Expert's staff for effective editing and production help, and Steve Cross and the Editorial Board for considering our subject area.

Image understanding

This issue highlights some of the ARPA IU projects:

The first article describes two defense-oriented projects: Radius, a system for aiding the image analyst in photo interpretation, and UGA-RSTA, a project in reconnaissance, surveillance, and target acquisition for an unmanned ground vehicle.

The past few years have seen an impressive surge in the capabilities of medical sensors, particularly those that result in 3D images (magnetic resonance imagery and tomographic imagery) W.E.L. Grimson of MIT's AI Laboratory describes an end-to-end system for image-guided surgery, which directly builds on a wide range of IU methods to provide a surgeon with visualization and guidance during surgical procedures.

Through the use of novel imaging devices called polarization cameras, polarization is emerging as a new general approach to image understanding and computer vision. Although human vision is oblivious to components of light polarization, polarization parameters of light provide an important visual extension to intensity and color. Lawrence B. Wolff of the Computer Vision Laboratory at Johns Hopkins University describes how the polarizing camera operates, and presents results in natural object recognition, automatic target detection and recognition, inspection of ship hull damage, and marine biology.

The Image Understanding Environment is a five-year program sponsored by ARPA to develop a common software environment for the development of algorithms and application systems. Theproject's ultimate goal is to provide the basic data structures and algorithms needed to implement IU systems and application prototypes. In the December IEEE Expert, the IUE Committee and the IUE Development Team will review this system's design and indicate its current status.

In the final article, which will appear in the next issue of IEEE Expert, Kanade discusses future applications of IU. Some of the exciting new directions are in virtual reality, and in image retrieval from databases of single image and video sequences. In virtual reality, a user interacts with a computer-constructed world either for entertainment purposes or to practice a useful simulation, such as a medical simulation of a surgical procedure. In image retrieval, a user requests an image or sequence of images whose content satisfies a query.


About the Authors

Oscar Firschein was the manager of ARPA's Image Understanding Program from 1991 to 1995 and is currently a free-lance consultant. His research interest is in image database storage and retrieval. He received a BEE from City College of New York and an MS in applied mathematics from the University of Pittsburgh. He coauthored Intelligence: The Eye, the Brain, and the Computer and Readings in Computer Vision. He is a member of the IEEE, ACM, and AAAI, and is the AI category editor for ACM Computing Reviews. Readers can contact him at 29 Stowe Lane, Menlo Park, CA 94025;
71 ms
(Ver 3.x)