The Community for Technology Leaders
Green Image
Issue No. 08 - Aug. (2013 vol. 35)
ISSN: 0162-8828
pp: 1944-1957
Dong Yu , Microsoft Research, Redmond
Li Deng , Microsoft Research, Redmond
Brian Hutchinson , University of Washington, Seattle
ABSTRACT
A novel deep architecture, the tensor deep stacking network (T-DSN), is presented. The T-DSN consists of multiple, stacked blocks, where each block contains a bilinear mapping from two hidden layers to the output layer, using a weight tensor to incorporate higher order statistics of the hidden binary ($([0,1])$) features. A learning algorithm for the T-DSN's weight matrices and tensors is developed and described in which the main parameter estimation burden is shifted to a convex subproblem with a closed-form solution. Using an efficient and scalable parallel implementation for CPU clusters, we train sets of T-DSNs in three popular tasks in increasing order of the data size: handwritten digit recognition using MNIST (60k), isolated state/phone classification and continuous phone recognition using TIMIT (1.1 m), and isolated phone classification using WSJ0 (5.2 m). Experimental results in all three tasks demonstrate the effectiveness of the T-DSN and the associated learning methods in a consistent manner. In particular, a sufficient depth of the T-DSN, a symmetry in the two hidden layers structure in each T-DSN block, our model parameter learning algorithm, and a softmax layer on top of T-DSN are shown to have all contributed to the low error rates observed in the experiments for all three tasks.
INDEX TERMS
Computer architecture, Training, Tensile stress, Vectors, Stacking, Machine learning, Closed-form solutions, WSJ, Deep learning, stacking networks, tensor, bilinear models, handwriting image classification, phone classification and recognition, MNIST, TIMIT
CITATION
Dong Yu, Li Deng, Brian Hutchinson, "Tensor Deep Stacking Networks", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 35, no. , pp. 1944-1957, Aug. 2013, doi:10.1109/TPAMI.2012.268
92 ms
(Ver )