The Community for Technology Leaders
Acoustics, Speech, and Signal Processing, IEEE International Conference on (2009)
Taipei, Taiwan
Apr. 19, 2009 to Apr. 24, 2009
ISBN: 978-1-4244-2353-8
pp: 3949-3952
Martin Wollmer , Institute for Human-Machine Communication, Technische Universität München, Germany
Florian Eyben , Institute for Human-Machine Communication, Technische Universität München, Germany
Joseph Keshet , Idiap Research Institute, Martigny, Switzerland
Alex Graves , Institute for Computer Science VI, Technische Universität München, Germany
Bjorn Schuller , Institute for Human-Machine Communication, Technische Universität München, Germany
Gerhard Rigoll , Institute for Human-Machine Communication, Technische Universität München, Germany
ABSTRACT
In this paper we propose a new technique for robust keyword spotting that uses bidirectional Long Short-Term Memory (BLSTM) recurrent neural nets to incorporate contextual information in speech decoding. Our approach overcomes the drawbacks of generative HMM modeling by applying a discriminative learning procedure that non-linearly maps speech features into an abstract vector space. By incorporating the outputs of a BLSTM network into the speech features, it is able to make use of past and future context for phoneme predictions. The robustness of the approach is evaluated on a keyword spotting task using the HUMAINE Sensitive Artificial Listener (SAL) database, which contains accented, spontaneous, and emotionally colored speech. The test is particularly stringent because the system is not trained on the SAL database, but only on the TIMIT corpus of read speech. We show that our method prevails over a discriminative keyword spotter without BLSTM-enhanced feature functions, which in turn has been proven to outperform HMM-based techniques.
INDEX TERMS
CITATION

J. Keshet, G. Rigoll, B. Schuller, F. Eyben, M. Wollmer and A. Graves, "Robust discriminative keyword spotting for emotionally colored spontaneous speech using bidirectional LSTM networks," Acoustics, Speech, and Signal Processing, IEEE International Conference on(ICASSP), Taipei, Taiwan, 2009, pp. 3949-3952.
doi:10.1109/ICASSP.2009.4960492
94 ms
(Ver 3.3 (11022016))