, MIT Media Lab
Pages: pp. 26-27
Processing power for computer graphics has increased enormously in recent years and will continue to do so as we ride the exponentials derived from Moore's Law. The accelerated transfer of workstation capabilities to plug-in cards and software applications running on standard PCs move detailed rendering outside the domain of high-end machines. Virtual worlds are now versed in the basic properties of Newtonian physics, and animated graphical entities are likewise slowly becoming sentient as researchers build them into intelligent agents endowed with layered, autonomous behaviors. Most other areas of computer graphics (such as modeling software) have also improved.
Common input devices and interfaces, however, have stayed relatively static. Most graphics designers and users still use variants of the same mouse, keyboard, and stylus that have been with us since the exodus of the keypunch. These narrow channels of expression limit the growing possibilities. Although we now have much more vivid virtual environments (VEs), most of us still interact with and build them at a single point of contact in a passive, 2D plane. Higher end applications have exploited more expensive and sophisticated interface hardware, such as instrumented gloves and magnetic tracking systems used for motion capture. Although these devices are becoming liberated from hardwired tethers to a base station, users remain "straightjacketed" as they wire their multiple pickups to a belt pack.
In the near term, as manufacturers build increasing processing power, graphics capability, and network bandwidth into conventional PCs, we'll need low-cost, noninvasive, multimodal interface technologies to use many of the revolutionary new applications. (For example, we can't expect users to spend much time in their networked distributed VEs when they have to drive their avatars with a keyboard and mouse. Likewise, it's impractical to expect users to wear anything like today's magnetic trackers when they want to shrug at an intelligent agent.) This need will become even more acute as tomorrow's computers break out of their current desktop form and reach into the environment. In the upcoming world of smart rooms, intelligent objects, ubiquitous displays, and wearable computing, many channels of a user's actions (visual, auditory, tactile, proximity, physiological, and so on) will be monitored by various embedded sensors. Data from these different systems will converge across local networks and fuse to produce a dynamic, multimodal "input device."
The five articles in this special issue explore several interface technologies that spotlight a few keystones along these paths. Some target the graphics designer, while others serve users interacting with responsive environments.
One well-established technology in the graphics community is laser scanning for capturing the texture and geometry of 3D surfaces. Several methods exist to shortcut the expense and complication traditionally associated with these systems, opening up other applications. As an example, we've developed a low-cost phase-measuring laser rangefinder at the MIT Media Lab 1 that we use for real-time tracking of bare hands in a plane above a projected video wall. This device allows direct, light-insensitive interaction with large-screen graphics.
Two articles in this issue take very different tacks on laser scanners for object capture. The first, by Petrov et al., gives a succinct portrait of state-of-the-art triangulation scanners. They describe the principles behind their Galatea scanner—exceptional for its low cost, high speed, and photorealistic response. In contrast, Borghese et al. take the minimalist approach with their Autoscan system. Here the authors discard the entire scanning mechanism in favor of a simple hand-held laser pointer with which the user "paints" the object of interest while a stereo pair of video cameras observe. A commercial real-time image-processing board finds the laser spot in both images and produces the 3D coordinates. Although much slower than an automated scanner, the Autoscan's expense is potentially minimal and the scanning details are entirely in the hands of the user, so to speak.
Although the field of machine vision has had many false starts in the past, it continues to make steady progress as more computation power is devoted to image analysis. Since the sensor hardware consists only of a simple video camera—already becoming a stock peripheral in PCs and laptops—the added expense is minimal. Because the hardware is unobtrusive (although potentially invasive from a privacy viewpoint) 2 and so much of human communication is gestural in nature, it's a natural interface for smart rooms and responsive environments. Freeman et al. provide a good picture of the promise and current capability in this area. They describe real-time gesture recognition systems they built using machine vision, including descriptions of interactive applications and special "retinal" hardware they developed to offload front-end processing overhead.
Another potential denizen in the evolving world of smart environments is electric field sensing. In the guise of "capacitive sensing," it has been with us for nearly a century, manifested in well-known devices such as elevator "touch" buttons, proximity detectors for factory automation, and the Theremin. 3 Electric field sensing has mainly been limited in computer interfaces to applications that require tactile contact—for example, planar touchpads that employ a dense matrix of sensing electrodes 4 and subcutaneous fingerprint capture systems. 5
The article by Smith et al. describes work done at the MIT Media Lab on noncontact electric field sensing for interfaces in computer graphics applications. Since the hardware expense is minimal and the low-frequency sensing field is not line-of-sight, these sensors can easily be embedded in a variety of "smart objects," enabling them to perceive user activity in their vicinity. This article outlines the theory and hardware behind the different modes of electric field sensing. It then gives several application examples, including 3D hand sensors and track pads, proximity-sensing computer monitors, and gesture-sensitive walls for interactive projected graphics.
As the requisite hardware improves in fidelity and affordability, computers begin to push back at us with tactile and haptic feedback. Applications beckon in the medical field, for instance, where surgeons can hone their techniques and strategies by practicing a particular operation on simulated cadavers or remotely participate in surgeries via telemedicine. Low-end products, such as inexpensive force-feedback joysticks and haptic "mice" through which the user can feel GUI objects have recently appeared, 6 hinting at a chance for haptics to break into the mass market. Massie's article rounds out this issue by describing haptics applications for computer graphics modelers, which let them "feel" their products, taking inspiration from the way a sculptor works with a block of clay.
As we head into the next century, the virtual information world will certainly become more important, requiring us to bring it closer to our sphere of interaction and perception. This will engender a shift to entirely new computer interface paradigms dominated by input devices very different from today's status quo. The articles in this special issue point toward several promising directions, but we have considerable work to do as ever-increasing computational power demands more input bandwidth and enables a fuller and more cogent digital response. You may have heard the phrase "the future's so bright, we'll have to wear shades," but now we'll have to build them with embedded displays and eye trackers! $\qquad\SSQBX$