The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.08 - Aug. (2013 vol.62)
pp: 1607-1615
Louis-Marie Aubert , Application Solutions (Electronics and Vision) Limited, Lewes
Roger Woods , Queen's University Belfast, Belfast
Scott Fischaber , Analytics Engines Ltd, Belfast
Richard Veitch , Maxeler, London
ABSTRACT
There is considerable interest in creating embedded, speech recognition hardware using the weighted finite state transducer (WFST) technique but there are performance and memory usage challenges. Two system optimization techniques are presented to address this; one approach improves token propagation by removing the WFST epsilon input arcs; another one-pass, adaptive pruning algorithm gives a dramatic reduction in active nodes to be computed. Results for memory and bandwidth are given for a 5,000 word vocabulary giving a better practical performance than conventional WFST; this is then exploited in an adaptive pruning algorithm that reduces the active nodes from 30,000 down to 4,000 with only a 2 percent sacrifice in speech recognition accuracy; these optimizations lead to a more simplified design with deterministic performance.
INDEX TERMS
Hidden Markov models, Speech recognition, Bandwidth, Speech, Decoding, Acoustics, Loading, WFST, Embedded processors, memory organization, speech recognition
CITATION
Louis-Marie Aubert, Roger Woods, Scott Fischaber, Richard Veitch, "Optimization of Weighted Finite State Transducer for Speech Recognition", IEEE Transactions on Computers, vol.62, no. 8, pp. 1607-1615, Aug. 2013, doi:10.1109/TC.2013.51
REFERENCES
[1] E.C. Lin and R.A. Rutenbar, "A Multi-FPGA 10x-Real-Time High-Speed Search Engine for a 5000-Word Vocabulary Speech Recognizer," Proc. Seventh ACM SIGDA Int'l Symp. Field Programmable Gate Arrays, pp. 83-92, Feb. 2009.
[2] O. Cheng, W. Abdulla, and Z. Salcic, "Hardware-Software Co-Design of Automatic Speech Recognition System for Embedded Real-Time Applications," IEEE Trans. Industrial Electronics, vol. 58, no. 3, pp. 850-859, Mar. 2011.
[3] K. You, J. Choi, and W. Sung, "Flexible and Expandable Speech Recognition Hardware with Weighted Finite State Transducers," J. Signal Processing Systems, vol. 66, no. 3, pp. 235-244, 2011.
[4] R. Veitch, L.-M. Aubert, R. Woods, and S. Fischaber, "FPGA Implementation of a Pipelined Gaussian Calculation for HMM-Based Large Vocabulary Speech Recognition," Int'l J. Reconfigurable Computing, vol. 2011, article 4, Jan. 2011.
[5] J. Johnston and R. Rutenbar, "A High-Rate, Low-Power, ASIC Speech Decoder Using Finite State Transducers," Proc. IEEE 23rd Int'l Conf. Application-Specific Systems, Architectures and Processors, pp. 77-85, 2012.
[6] M. Mohri, F.C.N. Pereira, and M. Riley, "Weighted Finite-State Transducers in Speech Recognition," Computer Speech and Language, vol. 16, pp. 69-88, 2002.
[7] P.J. Bourke and R.A. Rutenbar, "A Low-Power Hardware Search Architecture for Speech Recognition," Proc. Proc. European Conf. Speech Comm. and Technology, pp. 2102-2105, Sept. 2008.
[8] M. Mohri, F.C.N. Pereira, and M. Riley, "Speech Recognition with Weighted Finite-State Transducers," Handbook on Speech Processing and Speech Comm., Part E: Speech recognition, Springer, 2008.
[9] K. Vertanen, "Baseline WSJ Acoustic Models for HTK and Sphinx: Training Recipes and Recognition Experiments," technical report, Cavendish Laboratory, 2006.
[10] J. Pylkkönen, "New Pruning Criteria for Efficient Decoding," Proc. European Conf. Speech Comm. and Technology, pp. 581-584, 2005.
[11] P.R. Dixon, D.A. Caseiro, T. Oonishi, and S. Furui, "The Titech Large Vocabulary WFST Speech Recognition System," Proc. IEEE Workshop Automatic Speech Recognition and Understanding, pp. 443-448, Dec. 2007.
[12] J. Bilmes and H. Lin, "Online Adaptive Learning for Speech Recognition Decoding," Proc. 13th European Conf. Speech Comm. and Technology, 2010.
[13] K. Kartheek and D.V.S. Babu, "ASR for Embedded Real Time Applications," IOSR J. Electrical and Electronics Eng., vol. 3, pp. 28-36, 2012.
[14] J. Chong, E. Gonina, Y. Yi, and K. Keutzer, "A Fully Data Parallel WFST-Based Large Vocabulary Continuous Speech Recognition on a Graphics Processing Unit," Proc. European Conf. Speech Comm. and Technology, pp. 1183-1186, 2009.
7 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool