Fourth IEEE International Conference on Multimodal Interfaces (ICMI'02) Multi-Modal Temporal Asynchronicity Modeling by Product HMMs for Robust Pittsburgh, Pennsylvania October 14-October 16 ISBN: 0-7695-1834-6
Recently demands for Audio-visual Speech Recognition (AVSR) has been increased in order to make the speech recognition system robust to acoustic noise. There are two kinds of research issues in the audio-visual speech recognition research such as integration modeling considering asynchronicity between modalities and adaptive information weighting according information reliability. This paper proposes a method to effectively integrate audio and visual information. Such integration inevitably necessitates modeling of the synchronization and asynchronization of the audio and visual information. To address the time lag and correlation problems in individual features between speech and lip movements, we introduce a type of integrated HMM modeling of audio-visual information based on a family of a product HMM. The proposed model can represent state synchronicity not only within a phoneme but also between phonemes. Furthermore, we also propose a rapid stream weight optimization based on GPD algorithm for noisy bi-modal speech recognition. Evaluation experiments show that the proposed method improves the recognition accuracy for noisy speech. In SNR=0dB our proposed method attained 16% higher performance compared to a product HMMs without the synchronicity re-estimation.
Citation:
Satoshi Nakamura, Ken?ichi Kumatani, Satoshi Tamura, "Multi-Modal Temporal Asynchronicity Modeling by Product HMMs for Robust," icmi, pp.305, Fourth IEEE International Conference on Multimodal Interfaces (ICMI'02), 2002 Usage of this product signifies your acceptance of the Terms of Use. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||