The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.04 - April (2013 vol.35)
pp: 882-897
C. Wojek , Max Planck Inst. for Inf., Saarbrucken, Germany
S. Walk , Photogrammetry & Remote Sensing Group, ETH Zurich, Zurich, Switzerland
S. Roth , GRIS, Tech. Univ. Darmstadt, Darmstadt, Germany
K. Schindler , Photogrammetry & Remote Sensing Group, ETH Zurich, Zurich, Switzerland
B. Schiele , Max Planck Inst. for Inf., Saarbrucken, Germany
ABSTRACT
Following recent advances in detection, context modeling, and tracking, scene understanding has been the focus of renewed interest in computer vision research. This paper presents a novel probabilistic 3D scene model that integrates state-of-the-art multiclass object detection, object tracking and scene labeling together with geometric 3D reasoning. Our model is able to represent complex object interactions such as inter-object occlusion, physical exclusion between objects, and geometric context. Inference in this model allows us to jointly recover the 3D scene context and perform 3D multi-object tracking from a mobile observer, for objects of multiple categories, using only monocular video as input. Contrary to many other approaches, our system performs explicit occlusion reasoning and is therefore capable of tracking objects that are partially occluded for extended periods of time, or objects that have never been observed to their full extent. In addition, we show that a joint scene tracklet model for the evidence collected over multiple frames substantially improves performance. The approach is evaluated for different types of challenging onboard sequences. We first show a substantial improvement to the state of the art in 3D multipeople tracking. Moreover, a similar performance gain is achieved for multiclass 3D tracking of cars and trucks on a challenging dataset.
INDEX TERMS
Detectors, Cameras, Solid modeling, Cognition, Computational modeling, Hidden Markov models, Object detection,MCMC, Scene understanding, tracking, scene tracklets, tracking-by-detection
CITATION
C. Wojek, S. Walk, S. Roth, K. Schindler, B. Schiele, "Monocular Visual Scene Understanding: Understanding Multi-Object Traffic Scenes", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.35, no. 4, pp. 882-897, April 2013, doi:10.1109/TPAMI.2012.174
REFERENCES
[1] A. Ess, B. Leibe, K. Schindler, and L. Van Gool, "Robust Multi-Person Tracking from a Mobile Platform," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 10, pp. 1831-1846, Oct. 2009.
[2] D.M. Gavrila and S. Munder, "Multi-Cue Pedestrian Detection and Tracking from a Moving Vehicle," Int'l J. Computer Vision, vol. 73, pp. 41-59, 2007.
[3] K. Okuma, A. Taleghani, N. de Freitas, J. Little, and D. Lowe, "A Boosted Particle Filter: Multitarget Detection and Tracking," Proc. European Conf. Computer Vision, 2004.
[4] M. Andriluka, S. Roth, and B. Schiele, "People-Tracking-by-Detection and People-Detection-by-Tracking," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[5] R. Kaucic, A.G. Perera, G. Brooksby, J. Kaufhold, and A. Hoogs, "A Unified Framework for Tracking through Occlusions and across Sensor Gaps," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2005.
[6] C. Huang, B. Wu, and R. Nevatia, "Robust Object Tracking by Hierarchical Association of Detection Responses," Proc. European Conf. Computer Vision, 2008.
[7] D. Hoiem, A.A. Efros, and M. Hebert, "Putting Objects in Perspective," Int'l J. Computer Vision, vol. 80, no. 1, pp. 3-15, 2008.
[8] A. Torralba, "Contextual Priming for Object Detection," Int'l J. Computer Vision, vol. 53, no. 2, pp. 169-191, 2003.
[9] J. Shotton, J. Winn, C. Rother, and A. Criminisi, "TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-Class Object Recognition and Segmentation," Proc. European Conf. Computer Vision, 2006.
[10] Z. Tu, X. Chen, A. Yuille, and S. Zhu, "Image Parsing: Unifying Segmentation, Detection, and Recognition," Int'l J. Computer Vision, vol. 63, no. 2, pp. 113-140, 2005.
[11] A. Ess, T. Müller, H. Grabner, and L. Van Gool, "Segmentation-Based Urban Traffic Scene Understanding," Proc. British Machine Vision Conf., 2009.
[12] G. Brostow, J. Shotton, J. Fauqueur, and R. Cipolla, "Segmentation and Recognition Using SfM Point Clouds," Proc. European Conf. Computer Vision, 2008.
[13] A. Gupta, A.A. Efros, and M. Hebert, "Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics," Proc. European Conf. Computer Vision, 2010.
[14] U. Franke, C. Rabe, H. Badino, and S. Gehrig, "6D-Vision: Fusion of Stereo and Motion for Robust Environment Perception," Proc. DAGM, 2005.
[15] A. Geiger, M. Lauer, and R. Urtasun, "A Generative Model for 3D Urban Scene Understanding from Movable Platforms," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2011.
[16] A. Shashua, Y. Gdalyahu, and G. Hayun, "Pedestrian Detection for Driving Assistance Systems: Single-Frame Classification and System Level Performance," Proc. IEEE Int'l Conf. Intelligent Vehicles, 2004.
[17] W. Choi and S. Savarese, "Multiple Target Tracking in World Coordinate with Single, Minimally Calibrated Camera," Proc. European Conf. Computer Vision, 2010.
[18] A. Ess, K. Schindler, B. Leibe, and L. Van Gool, "Improved Multi-Person Tracking with Active Occlusion Handling," Proc. ICRA Workshop People Detection and Tracking, 2009.
[19] J. Xing, H. Ai, and S. Lao, "Multi-Object Tracking through Occlusions by Local Tracklets Filtering and Global Tracklets Association with Detection Responses," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[20] V. Shet, J. Neumann, V. Ramesh, and L. Davis, "Bilattice-Based Logical Reasoning for Human Detection," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007.
[21] B. Wu and R. Nevatia, "Detection and Segmentation of Multiple, Partially Occluded Objects by Grouping, Merging, Assigning Part Detection Responses," Int'l J. Computer Vision, vol. 82, no. 2, pp. 185-204, 2009.
[22] M. Enzweiler, A. Eigenstetter, B. Schiele, and D.M. Gavrila, "Multi-Cue Pedestrian Classification with Partial Occlusion Handling," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010.
[23] X. Wang, T.X. Han, and S. Yan, "An HOG-LBP Human Detector with Partial Occlusion Handling," Proc. 12th IEEE Int'l Conf. Computer Vision, 2009.
[24] Y.-Y. Lin, T.-L. Liu, and C.-S. Fuh, "Fast Object Detection with Occlusions," Proc. European Conf. Computer Vision, 2004.
[25] S. Kwak, W. Nam, B. Han, and J.H. Han, "Learning Occlusion with Likelihoods for Visual Tracking," Proc. 13th IEEE Int'l Conf. Computer Vision, 2011.
[26] J. Winn and J. Shotton, "The Layout Consistent Random Field for Recognizing and Segmenting Partially Occluded Objects," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006.
[27] A. Vedaldi and A. Zisserman, "Structured Output Regression for Detection with Partial Truncation," Proc. Neural Information Processing Systems Conf., 2009.
[28] T. Gao, B. Packer, and D. Koller, "A Segmentation-Aware Object Detection Model with Occlusion Handling," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2011.
[29] L. Sigal and M. Black, "Measure Locally, Reason Globally: Occlusion-Sensitive Articulated Pose Estimation," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006.
[30] M.D. Breitenstein, F. Reichlin, B. Leibe, E. Koller-Meier, and L. Van Gool, "Robust Tracking-by-Detection Using a Detector Confidence Particle Filter," Proc. 12th IEEE Int'l Conf. Computer Vision, 2009.
[31] Y. Li, C. Huang, and R. Nevatia, "Learning to Associate: Hybridboosted Multi-Target Tracker for Crowded Scene," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[32] L. Zhang, Y. Li, and R. Nevatia, "Global Data Association for Multi-Object Tracking Using Network Flows," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[33] A. Andriyenko and K. Schindler, "Globally Optimal Multi-Target Tracking on a Hexagonal Lattice," Proc. European Conf. Computer Vision, 2010.
[34] H. Pirsiavash, D. Ramanan, and C. Fowlkes, "Globally-Optimal Greedy Algorithms for Tracking a Variable Number of Objects," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2011.
[35] H.B. Shitrit, J. Berclaz, F. Fleuret, and P. Fua, "Tracking Multiple People under Global Appearance Constraints," Proc. 12th IEEE Int'l Conf. Computer Vision, 2011.
[36] Z. Khan, T. Balch, and F. Dellaert, "MCMC-Based Particle Filtering for Tracking a Variable Number of Interacting Targets," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, vol. 11, pp. 1805-1819, Nov. 2005.
[37] T. Zhao, R. Nevatia, and B. Wu, "Segmentation and Tracking of Multiple Humans in Crowded Environments," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 7, pp. 1198-1211, July 2008.
[38] M. Isard and J. MacCormick, "BraMBLe: A Bayesian Multiple-Blob Tracker," Proc. Eightth IEEE Int'l Conf. Computer Vision, pp. 34-41, 2001,
[39] C. Wojek, S. Roth, K. Schindler, and B. Schiele, "Monocular 3D Scene Modeling and Inference: Understanding Multi-Object Traffic Scenes," Proc. European Conf. Computer Vision, 2010.
[40] C. Wojek, S. Walk, S. Roth, and B. Schiele, "Monocular 3D Scene Understanding with Explicit Occlusion Reasoning," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2011.
[41] N. Dalal, "Finding People in Images and Videos," PhD dissertation, Institut Nat'l Polytechnique de Gre noble, July 2006.
[42] J. Platt, "Probabilistic Outputs for Support Vector Machines and Comparison to Regularized Likelihood Methods," Advances in Large Margin Classifiers, pp. 61-74, 2000.
[43] R. Jacobs, M. Jordan, S. Nowlan, and G. Hinton, "Adaptive Mixtures of Local Experts," Neural Computation, vol. 3, no. 1, pp. 79-87, 1991.
[44] A. Torralba, K.P. Murphy, and W.T. Freeman, "Sharing Visual Features for Multiclass and Multiview Object Detection," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 5, pp. 854-869, May 2007.
[45] W. Gilks, S. Richardson, and D. Spiegelhalter, Markov Chain Monte Carlo in Practice. Chapman & Hall, 1995.
[46] P.J. Green, "Reversible Jump Markov Chain Monte Carlo Computation and Bayesian Model Determination," Biometrika, vol. 82, no. 4, pp. 711-732, 1995.
[47] V.K. Singh, B. Wu, and R. Nevatia, "Pedestrian Tracking by Associating Tracklets Using Detection Residuals," Proc. IEEE Workshop Motion and Video Computing, pp. 1-8, 2008.
[48] P.M. Jorge, A.J. Abrantes, and J.S. Marques, "On-Line Tracking Groups of Pedestrians with Bayesian Networks," Proc. Sixth Int'l Workshop Performance Evaluation for Tracking and Surveillance, 2004.
[49] A.G.A. Perera, C. Srinivas, A. Hoogs, G. Brooksby, and W. Hu, "Multi-Object Tracking through Simultaneous Long Occlusions and Split-Merge Conditions," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006.
[50] P. Nillius, J. Sullivan, and S. Carlsson, "Multi-Target Tracking-Linking Identities Using Bayesian Network Inference," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006.
[51] A. Ess, B. Leibe, K. Schindler, and L. Van Gool, "A Mobile Vision System for Robust Multi-Person Tracking," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[52] P. Dollár, C. Wojek, B. Schiele, and P. Perona, "Pedestrian Detection: An Evaluation of the State of the Art," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 34, no. 4, pp. 743-761, Apr. 2012.
[53] S. Walk, N. Majer, K. Schindler, and B. Schiele, "New Features and Insights for Pedestrian Detection," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010.
[54] P.F. Felzenszwalb, R.B. Girshick, D. McAllester, and D. Ramanan, "Object Detection with Discriminatively Trained Part-Based Models," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 32, no. 9, pp. 1627-1645, Sept. 2010.
[55] N. Dalal and B. Triggs, "Histograms of Oriented Gradients for Human Detection," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2005.
[56] S. Maji, A. Berg, and J. Malik, "Classification Using Intersection Kernel SVMs Is Efficient," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[57] C. Wojek, S. Walk, and B. Schiele, "Multi-Cue Onboard Pedestrian Detection," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[58] Y. Hel-Or and H. Hel-Or, "Real-Time Pattern Matching Using Projection Kernels," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 9, pp. 1430-1445, Sept. 2005.
[59] M. Varma and A. Zisserman, "Classifying Images of Materials: Achieving Viewpoint and Illumination Independence," Proc. Seventh European Conf. Computer Vision, 2002.
118 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool