loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
2005 IEEE International Conference on Multimedia and Expo
Rapid Feature Space Speaker Adaptation for Multi-Stream HMM-Based Audio-Visual Speech Recognition
Amsterdam, Netherlands
July 06-July 06
ISBN: 0-7803-9331-7
null Jing Huang, IBM T. J. Watson Research Center, Yorktown Heights, NY 10598, USA, jghg@us.ibm.com
Multi-stream hidden Markov models (HMMs) have recently been very successful in audio-visual speech recognition, where the audio and visual streams are fused at the final decision level. In this paper we investigate fast feature space speaker adaptation using multi-stream HMMs for audio-visual speech recognition. In particular, we focus on studying the performance of feature-space maximum likelihood linear regression (fMLLR), a fast and effective method for estimating feature space transforms. Unlike the common speaker adaptation techniques of MAP or MLLR, fMLLR does not change the audio or visual HMM parameters, but simply applies a single transform to the testing features. We also address the problem of fast and robust on-line fMLLR adaptation using feature space maximum a posterior linear regression (fMAPLR). Adaptation experiments are reported on the IBM infrared headset audio-visual database. On average for a 20-speaker 1 hour independent test set, the multi-stream fMLLR achieves 31% relative gain on the clean audio condition, and 59% relative gain on the noisy audio condition (approximately 7dB) as compared to the baseline multi-stream system.
Citation:
null Jing Huang, E. Marcheret, K. Visweswariah, "Rapid Feature Space Speaker Adaptation for Multi-Stream HMM-Based Audio-Visual Speech Recognition," icme, pp.338-341, 2005 IEEE International Conference on Multimedia and Expo, 2005
Usage of this product signifies your acceptance of the Terms of Use.