2018 IEEE Winter Conference on Applications of Computer Vision (WACV) (2018)
Lake Tahoe, NV, USA
Mar 12, 2018 to Mar 15, 2018
Deep convolutional neural networks have been great success for image based recognition tasks. However, it is still unclear how to model the temporal evolution of videos effectively by deep networks. While recent deep models for videos show improvement by incorporating optical flow or aggregating high level appearance across frames, they focus on modeling either the long term temporal relations or short term motion. We propose Temporal Difference Networks (TDN) that model both long term relations and short term motion from videos. We leverage a simple but effective motion representation: difference of CNN features in our network and jointly modeling the motion at multiple scales in a single CNN. It achieves state-of-the-art performance on three different video classification benchmarks, showing the effectiveness of our approach to learn temporal relations in videos.
feedforward neural nets, image classification, image motion analysis, image recognition, image representation, video signal processing
J. Y. Ng and L. S. Davis, "Temporal Difference Networks for Video Action Recognition," 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA, 2018, pp. 1587-1596.