Fourth IEEE International Conference on Multimodal Interfaces (ICMI'02)
Audiovisual Arrays for Untethered Spoken Interfaces
Pittsburgh, Pennsylvania
October 14-October 16
ISBN: 0-7695-1834-6
When faced with a distant speaker at a known location in a noisy environment, a microphone array can provide a significantly improved audio signal for speech recognition. Estimating the location of a speaker in a reverberant environment from audio information alone can be quite difficult, so we use an array of video cameras to aid localization. Stereo processing techniques are used on pairs of cameras, and foreground 3-D points are grouped to estimate the trajectory of people as they move in an environment. These trajectories are used to guide a microphone array beamformer. Initial results using this system for speech recognition demonstrate increased recognition rates compared to non-array processing techniques.
Citation:
Kevin Wilson, Vibhav Rangarajan, Neal Checka, Trevor Darrell, "Audiovisual Arrays for Untethered Spoken Interfaces," icmi, pp.389, Fourth IEEE International Conference on Multimodal Interfaces (ICMI'02), 2002