Augmented Reality (AR) is rapidly becoming one of the best known buzzwords associated with future user interfaces. Its name recognition has accelerated further over recent months, thanks to the announcement of Google’s Project Glass, whose eyewear display prototype the popular press often incorrectly refers to as exemplifying AR. But, what does AR really mean? It refers to integrating virtual media with our perception of the physical environment, extending what we experience. Furthermore, those virtual media are geometrically aligned with the real world (researchers refer to this as registration) and experienced interactively, so that both are perceived as occupying a common space as the user moves. For example, a virtual animated character might appear to be seated on a real physical chair. In that case, the character needs to be rendered from the user’s current point of view, even as that view changes. Although AR often focuses on visual augmentations, it can be created for any and all of our senses, including hearing, touch, taste, and smell.
Although the term is barely more than 20 years old, the first AR system actually dates back well over 40 years to the work of Ivan Sutherland who developed a see-through stereoscopic head-worn display whose 3D position and orientation was tracked by a computer. His head-worn display used optics to overlay the user’s view of the world with a 3D computer graphics model that looked like it was part of the environment. Since then, researchers have also built AR systems that use cameras to capture live video of the real world, which a computer combines with rendered graphics to present on a display. Other AR systems use video projectors to project graphics directly on objects in the environment. In all of these, the display components can be worn on the head, held in the hand, or mounted in the environment.
Part of the excitement about AR is that it’s a major step beyond mobile and location-aware computing. If a conventional smartphone app displays content that is related to the surrounding world, both are still spatially separate. In contrast, AR merges the real world and the computer content, so that we can literally see the content in our surroundings through the display, which means we can situate potentially useful information exactly where it’s needed.
We begin our exploration of this month’s theme with some examples of application areas in which AR can be especially valuable. “First Deployments of Augmented Reality in Operating Rooms,” by Nassir Navab and his colleagues, builds on more than two decades of research into applying AR to surgical visualization, beginning with pioneering work done at University of North Carolina, Chapel Hill. This article describes several medical AR systems developed at the Technical University of Munich that are regularly used in multiple operating rooms during surgery. In fact, one of them is now a commercial product.
In contrast to the advantages AR can provide for highly trained surgeons, “Augmented Reality in the Psychomotor Phase of a Procedural Task” (video) shows how AR can assist users performing equipment maintenance and assembly tasks, including users who may have little or no experience performing the specific tasks. In this article, Steven Henderson and I introduce a system that uses overlaid graphics and text to guide users through an assembly task. Our user study shows that even participants with little or no experience performing the specific tasks can complete them in significantly less time and with significantly fewer mistakes when using AR on a see-through head-worn display than when using interactive computer-based instructions presented on a conventional flat-panel display.
One AR domain area that’s particularly well represented on smartphones is environmental browsing for tourism and sightseeing. Such apps use a smartphone’s GPS to determine the user’s location; they use the phone’s electronic compass, accelerometers, and gyroscopes to determine the user’s orientation; and they then overlay video from the smartphone’s camera on the display with textual labels and graphics representing points of interest around the user. Sixteen years ago, my lab developed a bulky backpack-based wearable outdoor AR research system that used similar sensors to provide that sort of functionality on a see-through head-worn display. Today’s multicore, high-resolution smartphones now eclipse most of the hardware we used, but many of the visualizations we and other outdoor AR researchers developed are quite similar to those in current commercial applications. In comparison, Christian Sandor and his colleagues’ “Egocentric Space-Distorting Visualizations for Rapid Environment Exploration in Mobile Mixed Reality” (video) presents two intriguing ways of visualizing points of interest that are offscreen or occluded by other objects — by processing textured 3D models of the environment. One visualization warps the user’s view of the environment to bring an offscreen building into view. The other visualization metaphorically “melts” occluding buildings to uncover a desired building, and then bring the building’s model closer to the viewer for inspection. Although this work was done just a few years ago using a belt-worn computer running a desktop operating system, advances in smartphone hardware, spurred on by the thirst for 3D apps, are rapidly eliminating the need to use a separate computer.
The last two articles address some of the key technological issues that AR raises. Robust commercially available approaches exist already for interactively determining a smartphone’s 3D position and orientation by aiming its camera at a known image, such as a printed poster. The software looks for features, which are visually distinctive patterns at known locations in the image that have been identified during a preprocessing phase. If the software recognizes a sufficient number of these known features, the system can compute the smartphone’s position and orientation and use that information to overlay virtual graphics relative to the poster. But, what if the environment isn’t already known to the system? Researchers had previously developed methods to find and track relatively small numbers of high-quality visual features in new environments. Five years ago, “Parallel Tracking and Mapping for Small AR Workspaces” (video), by Georg Klein and David Murray, introduced an approach that built on that earlier work, but instead opted for a larger number of simpler, faster-to-recognize features, achieving real-time performance on relatively low-end hardware. The field has seen advances on this work since then, but these results are still impressive!
Finally, we point toward the future with “KinectFusion: Real-Time Dense Surface Mapping and Tracking” (video). In this article, Richard Newcombe and his colleagues use a Microsoft Kinect, which contains both a color camera and a depth camera that can estimate the distance to points in the environment. This allows a moving system, running on a desktop computer, to build up a textured 3D virtual model of its environment interactively as it determines its position and orientation in that environment. The detailed model lets virtual objects appear to physically interact with the environment, as shown in the video.
I hope this has given you a taste of the exciting things that are happening in AR. If you’re interested in learning more, the key research conference in this field, the International Symposium on Mixed and Augmented Reality (ISMAR) takes place 5–8 November 2012, in Atlanta Georgia.
Steven Feiner is a professor of computer science at Columbia University, where he directs the Computer Graphics and User Interfaces Lab. His research interests include human–computer interaction, augmented reality and virtual environments, 3D user interfaces, knowledge-based design of graphics and multimedia, mobile and wearable computing, computer games, and information visualization. Contact him at feiner [at] cs dot columbia dot edu.