Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers (1994)
Pacific Grove, CA, USA
Oct. 31, 1994 to Nov. 2, 1994
P.L. Silsbee , Dept. of Electr. & Comput. Eng., Old Dominion Univ., Norfolk, VA, USA
Methods of integrating audio and visual information in an audiovisual HMM-based ASR system are investigated. Experiments involve discrimination of a set of 22 consonants, with various integration strategies. The role of the visual subsystem is varied; for example, in one run, the subsystem attempts to classify all 22 consonants, while in other runs it attempts only broader classifications. In a second experiment, a new HMM formulation is employed, which incorporates the integration into the HMM at a pre-categorical stage. A single variable parameter allows the relative contribution of audio and visual information to be controlled. This form of integration can be very easily incorporated into existing audio-based continuous speech recognizers.<
speech recognition, audio-visual systems, hidden Markov models
P. Silsbee, "Sensory integration in audiovisual automatic speech recognition," Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers(ACSSC), Pacific Grove, CA, USA, 1995, pp. 561-565.