This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 2
Visual Speech Recognition with Loosely Synchronized Feature Streams
Beijing, China
October 17-October 20
ISBN: 0-7695-2334-X
Kate Saenko, Massachusetts Institute of Technology
Karen Livescu, Massachusetts Institute of Technology
Michael Siracusa, Massachusetts Institute of Technology
Kevin Wilson,, Massachusetts Institute of Technology
James Glass, Massachusetts Institute of Technology
Trevor Darrell, Massachusetts Institute of Technology
We present an approach to detecting and recognizing spoken isolated phrases based solely on visual input. We adopt an architecture that first employs discriminative detection of visual speech and articulatory features, and then performs recognition using a model that accounts for the loose synchronization of the feature streams. Discriminative classifiers detect the subclass of lip appearance corresponding to the presence of speech, and further decompose it into features corresponding to the physical components of articulatory production. These components often evolve in a semi-independent fashion, and conventional viseme-based approaches to recognition fail to capture the resulting co-articulation effects. We present a novel dynamic Bayesian network with a multi-stream structure and observations consisting of articulatory feature classifier scores, which can model varying degrees of co-articulation in a principled way. We evaluate our visual-only recognition system on a command utterance task. We show comparative results on lip detection and speech/nonspeech classification, as well as recognition performance against several baseline systems.
Citation:
Kate Saenko, Karen Livescu, Michael Siracusa, Kevin Wilson,, James Glass, Trevor Darrell, "Visual Speech Recognition with Loosely Synchronized Feature Streams," iccv, vol. 2, pp.1424-1431, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 2, 2005
Usage of this product signifies your acceptance of the Terms of Use.