loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Fourth IEEE International Conference on Multimodal Interfaces (ICMI'02)
Audiovisual Arrays for Untethered Spoken Interfaces
Pittsburgh, Pennsylvania
October 14-October 16
ISBN: 0-7695-1834-6
Kevin Wilson, Massachusetts Institute of Technology
Vibhav Rangarajan, Massachusetts Institute of Technology
Neal Checka, Massachusetts Institute of Technology
Trevor Darrell, Massachusetts Institute of Technology
When faced with a distant speaker at a known location in a noisy environment, a microphone array can provide a significantly improved audio signal for speech recognition. Estimating the location of a speaker in a reverberant environment from audio information alone can be quite difficult, so we use an array of video cameras to aid localization. Stereo processing techniques are used on pairs of cameras, and foreground 3-D points are grouped to estimate the trajectory of people as they move in an environment. These trajectories are used to guide a microphone array beamformer. Initial results using this system for speech recognition demonstrate increased recognition rates compared to non-array processing techniques.
Citation:
Kevin Wilson, Vibhav Rangarajan, Neal Checka, Trevor Darrell, "Audiovisual Arrays for Untethered Spoken Interfaces," icmi, pp.389, Fourth IEEE International Conference on Multimodal Interfaces (ICMI'02), 2002
Usage of this product signifies your acceptance of the Terms of Use.