Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers (1994)
Pacific Grove, CA, USA
Oct. 31, 1994 to Nov. 2, 1994
ISSN: 1058-6393
ISBN: 0-8186-6405-3
pp: 561-565
P.L. Silsbee , Dept. of Electr. & Comput. Eng., Old Dominion Univ., Norfolk, VA, USA
Methods of integrating audio and visual information in an audiovisual HMM-based ASR system are investigated. Experiments involve discrimination of a set of 22 consonants, with various integration strategies. The role of the visual subsystem is varied; for example, in one run, the subsystem attempts to classify all 22 consonants, while in other runs it attempts only broader classifications. In a second experiment, a new HMM formulation is employed, which incorporates the integration into the HMM at a pre-categorical stage. A single variable parameter allows the relative contribution of audio and visual information to be controlled. This form of integration can be very easily incorporated into existing audio-based continuous speech recognizers.<>
speech recognition, audio-visual systems, hidden Markov models

