Interacting above and beyond the Display

Andy Wilson, Microsoft Research
Hrvoje Benko, Microsoft Research

Pages: 20–21

Abstract—Advances in sensing technology are poised to spark the next shift in human–computer interaction, liberating users from the 2D plane of interaction currently supported by the mouse and touchscreen. New sensors can read users' shape and motion as they move about in three dimensions. What signal-processing algorithms and interaction models are appropriate for this mode of interaction above and beyond the screen? How can we use the more detailed, nuanced information made available by new sensors to enable more expressive interfaces, going beyond what a mouse can do but preserving its familiar predictability? The five articles in this issue deal with these questions, covering the spectrum from specialized sensing hardware to high-level interaction models, across multiple physical scales and applications.

Keywords—spatial interfaces; human–computer interaction; computer graphics; multimedia; graphics; sensors; tablet computers; stylus input; multitouch interaction

It's surprising that for all the computer mouse's popularity and utility, it reduces the entirety of the user's input to a single 2D motion in a plane. As human beings, we might bemoan this vast simplification of ourselves. We are, after all, more than a point running around on a flat display! In the real world, we use much of our bodies in everyday tasks, and we communicate powerfully by gesture, gaze, and speech. But the point cursor continues to be a useful input abstraction, even long after our machines can do much more than simple point–rectangle hit testing. The classic event-driven mouse interface is now easy to program, and the mouse's precision is hard to beat, although many have tried. When was the last time you blamed your computer when you tried to click a button and missed?

Touch interfaces expand on the bandwidth of mouse input by adding multitouch capability and ease of use through direct manipulation. Their recent success is due partly to the hardware advances necessary to rapidly sense, process, and render fluid manipulation of onscreen objects. This movement has spurred a wave of innovation generally around interaction models, form factors, and hardware design. Fundamentally, however, even multitouch systems model our input as a small number of contact points confined to a flat screen. Might the next leap in human–computer interaction use more complex models of input to finally liberate us from the display plane?

Transcending 2D Input

Just as the touchscreen-computing era was enabled by refined sensing and signal-processing techniques, the next shift in human–computer interaction might be driven by even more sophisticated sensing techniques. For example, by using cameras and other sensors, future interfaces might exploit knowledge of users’ 3D position and shape as they move in front of the display. Such interfaces might leverage knowledge of the user's pose to enable gesture-based input from a distance. Applications involving rendering and manipulation of 3D graphics abound, including CAD, data visualization, and augmented reality. Recently, commodity depth cameras such as the Microsoft Kinect sensor have put sophisticated 3D sensing technology within millions of computer users’ reach.

Yet sensing hardware is just one piece of the puzzle. What signal-processing algorithms and interaction models can we use to approach and exceed the touchscreen's precision, performance, and utility? How can we use the more detailed, nuanced information made available by new sensors to enable more expressive interfaces, going beyond what a mouse can do but preserving its familiar predictability?

As we explore these questions in this special issue, it becomes clear that the variety of sensing platforms and interaction models available with today's technology doesn't deliver easy answers. As we give our systems increasing capability to sense the world, we perhaps shouldn't be surprised to find that just as a tremendous variety of ways exist to interact with the real world, so too are there many modes of interaction above and beyond the display.

In This Issue

The five articles in this issue cover the spectrum from specialized sensing hardware to high-level interaction models, across multiple physical scales and applications.

Many touchscreens would have us write with our fingers, and camera-based interfaces tout the ability to interact without a hardware device in hand. However, there's still value in familiar tangible tools such as the stylus, particularly where precise input is required.1 In “The IrPen: A 6-DOF Pen for Interaction with Tablet Computers,” Jaehyun Han and his colleagues detail an optical approach to precisely sense a stylus's position and orientation as it moves above a tablet computer. Such a device enables a variety of interesting above-the-surface interactions, such as virtually spray painting an onscreen 3D model.

Touching the screen to directly manipulate a virtual object is a well-understood interaction model.2 However, there seem to be many more options for manipulating 3D objects using a handheld 3D input device. One difficulty is how to select a reference frame in which to apply manipulations. In “3D Object Manipulation Using Virtual Handles with a Grabbing Metaphor,” Taeho Kim and Jinah Park propose and evaluate a technique to establish such a reference frame.

Commodity depth cameras’ availability has motivated many researchers to consider implementing complex 3D interfaces that exploit hand and body tracking. At first, removing the need to hold a special device seems like a great advantage, but much work remains to match the precision and predictability of even a simple button click. In “A Multitouchless Interface: Expanding User Interaction,” Philip Krejov, Andrew Gilbert, and Richard Bowden propose hand and finger tracking techniques, as well as a gesture model to support a variety of ways to interact with a visualization on a large curved projection screen.

Depth cameras can sense the precise shape and motion of the user's body, thereby enabling many application scenarios that go far beyond even multitouch interaction. In “3D Volume Drawing on a Potter's Wheel,” Sungmin Cho and his colleagues demonstrate creative use of a depth camera's shape-sensing capability. Their system is a great example of how to use new sensing technologies and sophisticated digital signal processing to create surprisingly analog experiences.

One challenge in moving the interaction off the display surface is often the difficulty of providing appropriate haptic feedback, particularly when the goal is to simulate familiar physical interactions. Although there are some initial investigations into providing haptics at a distance,3 we seem far from delivering a haptic rendering that matches our visuals’ quality. In “ClaytricSurface: An Interactive Deformable Display with Dynamic Stiffness Control,” Toshiki Sato and his colleagues address the problem of delivering a haptic experience by proposing a shape-changing display. This research is another example of using sensing technology to deliver an analog experience.

We hope these articles help you appreciate the interplay of sensing hardware, signal processing, interaction models, and applications above and beyond the screen. It's difficult to imagine how a truly successful experience wouldn't address each of these aspects in full. For example, consider how the GUI evolved around the mouse, how the smartphone evolved around the touchscreen, and how in both cases surprisingly many aspects of the system had to be heavily reworked. Although some might find this daunting, the problem's multidisciplinary nature attracts researchers and inventors from many different fields. Meanwhile, the possibility of sparking the next wave of innovation in human–computer interaction seems tantalizingly within reach!


Andy Wilson is a principal researcher at Microsoft Research, where he manages the Natural Interaction Research group. His research involves using sensing technologies to enable new modes of human–computer interaction. His interests include gesture-based interfaces, computer vision, inertial sensing, and display technologies. He helped found Microsoft's Surface Computing group and pioneered Microsoft's efforts to commercialize depth cameras. Wilson received a PhD from the MIT Media Laboratory. Contact him at
Hrvoje Benko is a researcher at Microsoft Research. His research interests include augmented reality, computational illumination, surface computing, new input form factors and devices, and touch and freehand gestural input. He helped develop the Microsoft Touch Mouse, which lets users perform multitouch gestures on top of the mouse. He was a program cochair for 2012 ACM Symposium on User Interface Software and Technology (UIST) and is the UIST 2014 general chair. Benko received his PhD in computer science from Columbia University, where he investigated augmented-reality techniques that combine immersive experiences with interactive tabletops. He's on the IEEE CG&A editoral board. Contact him at
63 ms
(Ver 3.x)