Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers (1994)
Pacific Grove, CA, USA
Oct. 31, 1994 to Nov. 2, 1994
C. Bregler , Div. of Comput. Sci., California Univ., Berkeley, CA, USA
S.M. Omohundro , Div. of Comput. Sci., California Univ., Berkeley, CA, USA
We explore multimodal recognition by combining visual lipreading with acoustic speech recognition. We show that combining visual and acoustic speech information improves the recognition performance significantly, especially in noisy environments. This is achieved with a hybrid speech recognition architecture, consisting of a new visual learning and tracking mechanism, a channel robust acoustic front end, a connectionist phone classifier, and a HMM based sentence classifier. Our focus in this paper is on the visual subsystem based on "surface-learning" and active vision models. Our bimodal hybrid speech recognition system has already been applied to a multi-speaker spelling task, and work is in progress to apply it to a speaker independent spontaneous speech task, the "Berkeley Restaurant Project (BeRP)".<
speech recognition, acoustic signal processing, active vision, vision, tracking, hidden Markov models, spelling aids, multilayer perceptrons, feedforward neural nets, learning (artificial intelligence)
C. Bregler, S. Omohundro and Yochai Konig, "A hybrid approach to bimodal speech recognition," Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers(ACSSC), Pacific Grove, CA, USA, 1995, pp. 556-560.