The Community for Technology Leaders
2017 IEEE International Conference on Computer Vision (ICCV) (2017)
Venice, Italy
Oct. 22, 2017 to Oct. 29, 2017
ISSN: 2380-7504
ISBN: 978-1-5386-1032-9
pp: 5727-5736
ABSTRACT
We present a Temporal Context Network (TCN) for precise temporal localization of human activities. Similar to the Faster-RCNN architecture, proposals are placed at equal intervals in a video which span multiple temporal scales. We propose a novel representation for ranking these proposals. Since pooling features only inside a segment is not sufficient to predict activity boundaries, we construct a representation which explicitly captures context around a proposal for ranking it. For each temporal segment inside a proposal, features are uniformly sampled at a pair of scales and are input to a temporal convolutional neural network for classification. After ranking proposals, non-maximum suppression is applied and classification is performed to obtain final detections. TCN outperforms state-of-the-art methods on the ActivityNet dataset and the THU-MOS14 dataset.
INDEX TERMS
convolution, feature extraction, gesture recognition, image classification, image motion analysis, image representation, image segmentation, learning (artificial intelligence), neural nets, object detection, video signal processing
CITATION

X. Dai, B. Singh, G. Zhang, L. S. Davis and Y. Q. Chen, "Temporal Context Network for Activity Localization in Videos," 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2018, pp. 5727-5736.
doi:10.1109/ICCV.2017.610
175 ms
(Ver 3.3 (11022016))