2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Honolulu, Hawaii, USA
July 21, 2017 to July 26, 2017
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/CVPR.2017.390
This paper proposes efficient and powerful deep networks for action prediction from partially observed videos containing temporally incomplete action executions. Different from after-the-fact action recognition, action prediction task requires action labels to be predicted from these partially observed videos. Our approach exploits abundant sequential context information to enrich the feature representations of partial videos. We reconstruct missing information in the features extracted from partial videos by learning from fully observed action videos. The amount of the information is temporally ordered for the purpose of modeling temporal orderings of action segments. Label information is also used to better separate the learned features of different categories. We develop a new learning formulation that enables efficient model training. Extensive experimental results on UCF101, Sports-1M and BIT datasets demonstrate that our approach remarkably outperforms state-of-the-art methods, and is up to 300× faster than these methods. Results also show that actions differ in their prediction characteristics, some actions can be correctly predicted even though only the beginning 10% portion of videos is observed.
feature extraction, image motion analysis, learning (artificial intelligence), object recognition, video databases, video signal processing
Y. Kong, Z. Tao and Y. Fu, "Deep Sequential Context Networks for Action Prediction," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii, USA, 2017, pp. 3662-3670.