This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
Speaker detection using the timing structure of lip motion and sound
Anchorage, AK, USA
June 23-June 28
ISBN: 978-1-4244-2339-2
Yu Horii, Graduate School of Informatics, Kyoto University, Yoshida-Honmachi, Sakyo, 6068501, Japan
Hiroaki Kawashima, Graduate School of Informatics, Kyoto University, Yoshida-Honmachi, Sakyo, 6068501, Japan
Takashi Matsuyama, Graduate School of Informatics, Kyoto University, Yoshida-Honmachi, Sakyo, 6068501, Japan
In this paper, we propose a novel approach to speaker detection by an integration of audio-visual information using the cue of timing structure. We first extract feature sequences of lip motion and sound, and segment each of them into temporal intervals. Then, we construct a cross-media timing-structure model of human speech by learning the temporal relations of overlapping intervals. Based on the learned model, we realize speaker detection by evaluating the timing structure of the observed video and audio. Our experimental result shows the effectiveness of using temporal relations of intervals for speaker detection.
Citation:
Yu Horii, Hiroaki Kawashima, Takashi Matsuyama, "Speaker detection using the timing structure of lip motion and sound," cvprw, pp.1-8, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2008
Usage of this product signifies your acceptance of the Terms of Use.