This Article 
 Bibliographic References 
 Add to: 
Monocular Visual Scene Understanding: Understanding Multi-Object Traffic Scenes
April 2013 (vol. 35 no. 4)
pp. 882-897
C. Wojek, Max Planck Inst. for Inf., Saarbrucken, Germany
S. Walk, Photogrammetry & Remote Sensing Group, ETH Zurich, Zurich, Switzerland
S. Roth, GRIS, Tech. Univ. Darmstadt, Darmstadt, Germany
K. Schindler, Photogrammetry & Remote Sensing Group, ETH Zurich, Zurich, Switzerland
B. Schiele, Max Planck Inst. for Inf., Saarbrucken, Germany
Following recent advances in detection, context modeling, and tracking, scene understanding has been the focus of renewed interest in computer vision research. This paper presents a novel probabilistic 3D scene model that integrates state-of-the-art multiclass object detection, object tracking and scene labeling together with geometric 3D reasoning. Our model is able to represent complex object interactions such as inter-object occlusion, physical exclusion between objects, and geometric context. Inference in this model allows us to jointly recover the 3D scene context and perform 3D multi-object tracking from a mobile observer, for objects of multiple categories, using only monocular video as input. Contrary to many other approaches, our system performs explicit occlusion reasoning and is therefore capable of tracking objects that are partially occluded for extended periods of time, or objects that have never been observed to their full extent. In addition, we show that a joint scene tracklet model for the evidence collected over multiple frames substantially improves performance. The approach is evaluated for different types of challenging onboard sequences. We first show a substantial improvement to the state of the art in 3D multipeople tracking. Moreover, a similar performance gain is achieved for multiclass 3D tracking of cars and trucks on a challenging dataset.
Index Terms:
video surveillance,automobiles,computer graphics,computer vision,image motion analysis,image representation,inference mechanisms,natural scenes,object detection,object recognition,object tracking,observers,traffic engineering computing,multiobject traffic scene understanding,context modeling,computer vision,probabilistic 3D scene model,multiclass object detection,scene labeling,geometric 3D reasoning,complex object interaction representation,inference mechanism,mobile observer,monocular video,occlusion reasoning,3D multipeople tracking,3D multiclass object tracking,cars,trucks,monocular visual scene understanding,Detectors,Cameras,Solid modeling,Cognition,Computational modeling,Hidden Markov models,Object detection,MCMC,Scene understanding,tracking,scene tracklets,tracking-by-detection
C. Wojek, S. Walk, S. Roth, K. Schindler, B. Schiele, "Monocular Visual Scene Understanding: Understanding Multi-Object Traffic Scenes," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 4, pp. 882-897, April 2013, doi:10.1109/TPAMI.2012.174
Usage of this product signifies your acceptance of the Terms of Use.