2001 IEEE International Conference on Multimedia and Expo (ICME'01) AN ADAPTIVE INTEGRATION BASED ON PRODUCT HMM FOR AUDIO-VISUAL SPEECH RECOGNITION Tokyo, Japan August 22-August 25 ISBN: 0-7695-1198-8
There have been higher demands recently for Automatic Speech Recognition (ASR) systems able to operate robustly in acoustically noisy environments. This paper proposes a method to effectively integrate audio and visual information in audiovisual (bi-modal) ASR systems. For such integration, the following issues are important: (1) The synchronization of the audio and visual information, and (2) The optimization of a system in its environment. In (1), the individual feature of the speech and lip movements has the time lag, and has the correlation. To address this problem, we introduce an integration method using HMM composition. In (2), we examine whether the GPD algorithm can adaptively estimate the stream weights. Evaluation experiments show that the proposed method improves the recognition accuracy for noisy speech.
Citation:
Kenichi Kumatani, Satoshi Nakamura, Kiyohiro Shikano, "AN ADAPTIVE INTEGRATION BASED ON PRODUCT HMM FOR AUDIO-VISUAL SPEECH RECOGNITION," icme, pp.207, 2001 IEEE International Conference on Multimedia and Expo (ICME'01), 2001 Usage of this product signifies your acceptance of the Terms of Use. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||