The Community for Technology Leaders
Green Image
Issue No. 06 - June (2017 vol. 66)
ISSN: 0018-9340
pp: 996-1007
Wei-Yu Tsai , Pennsylvania State University, University Park, State College, PA
Davis R. Barch , IBM Research, Almaden Research Center, San Jose, CA
Andrew S. Cassidy , IBM Research, Almaden Research Center, San Jose, CA
Michael V. DeBole , IBM Research, Almaden Research Center, San Jose, CA
Alexander Andreopoulos , IBM Research, Almaden Research Center, San Jose, CA
Bryan L. Jackson , IBM Research, Almaden Research Center, San Jose, CA
Myron D. Flickner , IBM Research, Almaden Research Center, San Jose, CA
John V. Arthur , IBM Research, Almaden Research Center, San Jose, CA
Dharmendra S. Modha , IBM Research, Almaden Research Center, San Jose, CA
John Sampson , Pennsylvania State University, University Park, State College, PA
Vijaykrishnan Narayanan , Pennsylvania State University, University Park, State College, PA
ABSTRACT
Deep neural networks (DNN) have been shown to be very effective at solving challenging problems in several areas of computing, including vision, speech, and natural language processing. However, traditional platforms for implementing these DNNs are often very power hungry, which has lead to significant efforts in the development of configurable platforms capable of implementing these DNNs efficiently. One of these platforms, the IBM TrueNorth processor, has demonstrated very low operating power in performing visual computing and neural network classification tasks in real-time. The neuron computation, synaptic memory, and communication fabrics are all configurable, so that a wide range of network types and topologies can be mapped to TrueNorth. This reconfigurability translates into the capability to support a wide range of low-power functions in addition to feed-forward DNN classifiers, including for example, the audio processing functions presented here.In this work, we propose an end-to-end audio processing pipeline that is implemented entirely on a TrueNorth processor and designed to specifically leverage the highly-parallel, low-precision computing primitives TrueNorth offers. As part of this pipeline, we develop an audio feature extractor (LATTE) designed for implementation on TrueNorth, and explore the tradeoffs among several design variants in terms of accuracy, power, and performance. We customize the energy-efficient deep neuromorphic networks structures that our design utilizes as the classifier and show how classifier parameters can trade between power and accuracy. In addition to enabling a wide range of diverse functions, the reconfigurability of TrueNorth enables re-training and re-programming the system to satisfy varying energy, speed, area, and accuracy requirements. The resulting system's end-to-end power consumption can be as low as $_$14.43\text{mW}$_$ , which would give up to 100 hours of continuous usage with button cell batteries (CR3023 $_$1.5\; \text{Whr}$_$ ) or 450 hours with cellphone batteries (iPhone 6s $_$6.55\; \text{Whr}$_$ ).
INDEX TERMS
Feature extraction, Axons, Mel frequency cepstral coefficient, Pipelines, Fabrics, Programming
CITATION

W. Tsai et al., "Always-On Speech Recognition Using TrueNorth, a Reconfigurable, Neurosynaptic Processor," in IEEE Transactions on Computers, vol. 66, no. 6, pp. 996-1007, 2017.
doi:10.1109/TC.2016.2630683
92 ms
(Ver 3.3 (11022016))