The Community for Technology Leaders
Acoustics, Speech, and Signal Processing, IEEE International Conference on (2009)
Taipei, Taiwan
Apr. 19, 2009 to Apr. 24, 2009
ISBN: 978-1-4244-2353-8
pp: 3861-3864
G. Heigold , Chair of Computer Science 6, RWTH Aachen University, Germany
G. Zweig , Microsoft Research, One Microsoft Way, Redmond, WA 98052, USA
X. Li , Microsoft Research, One Microsoft Way, Redmond, WA 98052, USA
P. Nguyen , Microsoft Research, One Microsoft Way, Redmond, WA 98052, USA
ABSTRACT
We introduce a direct model for speech recognition that assumes an unstructured, i.e., flat text output. The flat model allows us to model arbitrary attributes and dependences of the output. This is different from the HMMs typically used for speech recognition. This conventional modeling approach is based on sequential data and makes rigid assumptions on the dependences. HMMs have proven to be convenient and appropriate for large vocabulary continuous speech recognition. Our task under consideration, however, is the Windows Live Search for Mobile (WLS4M) task [1]. This is a cellphone application that allows users to interact with web-based information portals. In particular, the set of valid outputs can be considered discrete and finite (although probably large, i.e., unseen events are an issue). Hence, a flat direct model lends itself to this task, making the adding of different knowledge sources and dependences straightforward and cheap. Using e.g. HMM posterior, m-gram, and spotter features, significant improvements over the conventional HMM system were observed.
INDEX TERMS
CITATION

G. Heigold, P. Nguyen, G. Zweig and X. Li, "A flat direct model for speech recognition," Acoustics, Speech, and Signal Processing, IEEE International Conference on(ICASSP), Taipei, Taiwan, 2009, pp. 3861-3864.
doi:10.1109/ICASSP.2009.4960470
95 ms
(Ver 3.3 (11022016))