This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
3D Convolutional Neural Networks for Human Action Recognition
Jan. 2013 (vol. 35 no. 1)
pp. 221-231
Shuiwang Ji, Dept. of Comput. Sci., Old Dominion Univ., Norfolk, VA, USA
Wei Xu, Facebook, Inc., Menlo Park, CA, USA
Ming Yang, NEC Labs. America, Inc., Cupertino, CA, USA
Kai Yu, Baidu Inc., Beijing, China
We consider the automated recognition of human actions in surveillance videos. Most current methods build classifiers based on complex handcrafted features computed from the raw inputs. Convolutional neural networks (CNNs) are a type of deep model that can act directly on the raw inputs. However, such models are currently limited to handling 2D inputs. In this paper, we develop a novel 3D CNN model for action recognition. This model extracts features from both the spatial and the temporal dimensions by performing 3D convolutions, thereby capturing the motion information encoded in multiple adjacent frames. The developed model generates multiple channels of information from the input frames, and the final feature representation combines information from all channels. To further boost the performance, we propose regularizing the outputs with high-level features and combining the predictions of a variety of different models. We apply the developed models to recognize human actions in the real-world environment of airport surveillance videos, and they achieve superior performance in comparison to baseline methods.
Index Terms:
video surveillance,feature extraction,gesture recognition,image classification,image motion analysis,image representation,neural nets,spatiotemporal phenomena,baseline methods,3D convolutional neural networks,automated human action recognition,complex handcrafted features,deep model,3D CNN model,temporal dimensions,spatial dimensions,motion information encoding,feature representation,high-level features,airport surveillance videos,Three dimensional displays,Solid modeling,Feature extraction,Computer architecture,Videos,Kernel,Computational modeling,action recognition,Deep learning,convolutional neural networks,3D convolution,model combination
Citation:
Shuiwang Ji, Wei Xu, Ming Yang, Kai Yu, "3D Convolutional Neural Networks for Human Action Recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 1, pp. 221-231, Jan. 2013, doi:10.1109/TPAMI.2012.59
Usage of this product signifies your acceptance of the Terms of Use.