2018 IEEE Winter Conference on Applications of Computer Vision (WACV) (2018)
Lake Tahoe, NV, USA
Mar 12, 2018 to Mar 15, 2018
This paper addresses the problem of detecting relevant motion caused by objects of interest (e.g., person and vehicles) in large scale home surveillance videos. The traditional method usually consists of two separate steps, i.e., detecting moving objects with background subtraction running on the camera, and filtering out nuisance motion events with deep learning based object detection and tracking running on cloud. The method is extremely slow, and does not fully leverage the spatial-temporal redundancies with a pre-trained off-the-shelf object detector. To dramatically speedup relevant motion event detection and improve its performance, we propose a novel network for relevant motion event detection, ReMotENet, which is a unified, end-to-end data-driven method using spatial-temporal attention-based 3D ConvNets to jointly model the appearance and motion of objects-of-interest in a video. Re-MotENet parses an entire video clip in one forward pass of a neural network to achieve significant speedup, which exploits the properties of home surveillance videos, and enhances 3D ConvNets with a spatial-temporal attention model and frame differencing to encourage the network to focus on the relevant moving objects. Experiments demonstrate that our method can achieve comparable or event better performance than the object detection based method but with three to four orders of magnitude speedup (up to 20k) on GPU devices. Our network is efficient, compact and light-weight. It can detect relevant motion on a 15s surveillance video clip within 4-8 milliseconds on a GPU and a fraction of second (0.17-0.39s) on a CPU with a model size of less than 1MB.
image motion analysis, learning (artificial intelligence), neural nets, object detection, video surveillance
R. Yu, H. Wang and L. S. Davis, "ReMotENet: Efficient Relevant Motion Event Detection for Large-Scale Home Surveillance Videos," 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA, 2018, pp. 1642-1651.