The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.12 - December (2009 vol.31)
pp: 2179-2195
Markus Enzweiler , University of Heidelberg, Heidelberg
Dariu M. Gavrila , Daimler AG Group Research, Ulm and University of Amsterdam, Amstesterdam
ABSTRACT
Pedestrian detection is a rapidly evolving area in computer vision with key applications in intelligent vehicles, surveillance, and advanced robotics. The objective of this paper is to provide an overview of the current state of the art from both methodological and experimental perspectives. The first part of the paper consists of a survey. We cover the main components of a pedestrian detection system and the underlying models. The second (and larger) part of the paper contains a corresponding experimental study. We consider a diverse set of state-of-the-art systems: wavelet-based AdaBoost cascade [74], HOG/linSVM [11], NN/LRF [75], and combined shape-texture detection [23]. Experiments are performed on an extensive data set captured onboard a vehicle driving through urban environment. The data set includes many thousands of training samples as well as a 27-minute test sequence involving more than 20,000 images with annotated pedestrian locations. We consider a generic evaluation setting and one specific to pedestrian detection onboard a vehicle. Results indicate a clear advantage of HOG/linSVM at higher image resolutions and lower processing speeds, and a superiority of the wavelet-based AdaBoost cascade approach at lower image resolutions and (near) real-time processing speeds. The data set (8.5 GB) is made public for benchmarking purposes.
INDEX TERMS
Pedestrian detection, survey, performance analysis, benchmarking.
CITATION
Markus Enzweiler, Dariu M. Gavrila, "Monocular Pedestrian Detection: Survey and Experiments", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.31, no. 12, pp. 2179-2195, December 2009, doi:10.1109/TPAMI.2008.260
REFERENCES
[1] S. Agarwal, A. Awan, and D. Roth, “Learning to Detect Objects in Images via a Sparse, Part-Based Representation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 11, pp. 1475-1490, Nov. 2004.
[2] I.P. Alonso et al. “Combination of Feature Extraction Methods for SVM Pedestrian Detection,” IEEE Trans. Intelligent Transportation Systems, vol. 8, no. 2, pp. 292-307, June 2007.
[3] S. Arulampalam, S. Maskell, N. Gordon, and T. Clapp, “A Tutorial on Particle Filters for On-Line Non-Linear/Non-Gaussian Bayesian Tracking,” IEEE Trans. Signal Processing, vol. 50, no. 2, pp. 174-188, Feb. 2002.
[4] A. Baumberg, “Hierarchical Shape Fitting Using an Iterated Linear Filter,” Proc. British Machine Vision Conf., pp. 313-323, 1996.
[5] M. Bergtholdt, D. Cremers, and C. Schnörr, “Variational Segmentation with Shape Priors,” Handbook of Math. Models in Computer Vision, N. Paragios, Y. Chen, and O. Faugeras, eds., Springer, 2005.
[6] G. Borgefors, “Distance Transformations in Digital Images,” Computer Vision, Graphics, and Image Processing, vol. 34, no. 3, pp. 344-371, 1986.
[7] A. Broggi, A. Fascioli, I. Fedriga, A. Tibaldi, and M.D. Rose, “Stereo-Based Preprocessing for Human Shape Localization in Unstructured Environments,” Proc. IEEE Intelligent Vehicles Symp., pp. 410-415, 2003.
[8] T.F. Cootes, S. Marsland, C.J. Twining, K. Smith, and C.J. Taylor, “Groupwise Diffeomorphic Non-Rigid Registration for Automatic Model Building,” Proc. European Conf. Computer Vision, pp. 316-327, 2004.
[9] T.F. Cootes and C.J. Taylor, “Statistical Models of Appearance for Computer Vision,” technical report, Univ. of Manchester, 2004.
[10] N. Dalal, “Finding People in Images and Videos,” PhD thesis, Institut Nat'l Polytechnique de Gre noble, 2006.
[11] N. Dalal and B. Triggs, “Histograms of Oriented Gradients for Human Detection,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, pp. 886-893, 2005.
[12] N. Dalal, B. Triggs, and C. Schmid, “Human Detection Using Oriented Histograms of Flow and Appearance,” Proc. European Conf. Computer Vision, pp. 428-441, 2006.
[13] J. Deutscher, A. Blake, and I.D. Reid, “Articulated Body Motion Capture by Annealed Particle Filtering,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, pp. 126-133, 2000.
[14] M. Enzweiler and D.M. Gavrila, “A Mixed Generative-Discriminative Framework for Pedestrian Classification,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, 2008.
[15] M. Enzweiler, P. Kanter, and D.M. Gavrila, “Monocular Pedestrian Recognition Using Motion Parallax,” Proc. IEEE Intelligent Vehicles Symp., pp. 792-797, 2008.
[16] A. Ess, B. Leibe, and L. van Gool, “Depth and Appearance for Mobile Scene Analysis,” Proc. Int'l Conf. Computer Vision, 2007.
[17] L. Fan, K.-K. Sung, and T.-K. Ng, “Pedestrian Registration in Static Images with Unconstrained Background,” Pattern Recognition, vol. 36, pp. 1019-1029, 2003.
[18] Y. Freund and R.E. Schapire, “A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting,” Proc. European Conf. Computational Learning Theory, pp. 23-37, 1995.
[19] K. Fukushima, S. Miyake, and T. Ito, “Neocognitron: A Neural Network Model for a Mechanism of Visual Pattern Recognition,” IEEE Trans. Systems, Man, and Cybernetics, vol. 13, pp. 826-834, 1983.
[20] T. Gandhi and M.M. Trivedi, “Pedestrian Protection Systems: Issues, Survey, and Challenges,” IEEE Trans. Intelligent Transportation Systems, vol. 8, no. 3, pp. 413-430, Sept. 2007.
[21] D.M. Gavrila, “The Visual Analysis of Human Movement: A Survey,” Computer Vision and Image Understanding, vol. 73, no. 1, pp. 82-98, 1999.
[22] D.M. Gavrila, “A Bayesian Exemplar-Based Approach to Hierarchical Shape Matching,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 8, pp. 1408-1421, Aug. 2007.
[23] D.M. Gavrila and S. Munder, “Multi-Cue Pedestrian Detection and Tracking from a Moving Vehicle,” Int'l J. Computer Vision, vol. 73, no. 1, pp. 41-59, 2007.
[24] B.E. Goldstein, Sensation and Perception, sixth ed. Wadsworth, 2002.
[25] T. Heap and D. Hogg, “Improving Specificity in PDMs Using a Hierarchical Approach,” Proc. British Machine Vision Conf., pp. 80-89, 1997.
[26] T. Heap and D. Hogg, “Wormholes in Shape Space: Tracking through Discontinuous Changes in Shape,” Proc. Int'l Conf. Computer Vision, pp. 344-349, 1998.
[27] B. Heisele and C. Wöhler, “Motion-Based Recognition of Pedestrians,” Proc. Int'l Conf. Pattern Recognition, pp. 1325-1330, 1998.
[28] INRIA Person Dataset, http://pascal.inrialpes.fr/datahuman/, 2007.
[29] Intel OpenCV Library, http://www.intel.com/technology/ computing opencv/, 2007.
[30] M. Isard and A. Blake, “CONDENSATION—Conditional Density Propagation for Visual Tracking,” Int'l J. Computer Vision, vol. 29, no. 1, pp. 5-28, 1998.
[31] M. Isard and A. Blake, “ICONDENSATION: Unifying Low-Level and High-Level Tracking in a Stochastic Framework,” Proc. Int'l Conf. Computer Vision, pp. 893-908, 1998.
[32] M. Isard and J. MacCormick, “BraMBLE: A Bayesian Multiple-Blob Tracker,” Proc. Int'l Conf. Computer Vision, pp. 34-41, 2001.
[33] A.K. Jain, R.P.W. Duin, and J. Mao, “Statistical Pattern Recognition: A Review,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 1, pp. 4-37, Jan. 2000.
[34] M.J. Jones and T. Poggio, “Multidimensional Morphable Models,” Proc. Int'l Conf. Computer Vision, pp. 683-688, 1998.
[35] H. Kang and D. Kim, “Real-Time Multiple People Tracking Using Competitive Condensation,” Pattern Recognition, vol. 38, no. 7, pp. 1045-1058, 2005.
[36] Z. Khan, T. Balch, and F. Dellaert, “MCMC-Based Particle Filtering for Tracking a Variable Number of Interacting Targets,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 11, pp. 1805-1819, Nov. 2005.
[37] H.W. Kuhn, “The Hungarian Method for the Assignment Problem,” Naval Research Logistics Quarterly, vol. 2, pp. 83-97, 1955.
[38] S. Lee, Y. Liu, and R. Collins, “Shape Variation-Based Frieze Pattern for Robust Gait Recognition,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, 2007.
[39] B. Leibe, N. Cornelis, K. Cornelis, and L.V. Gool, “Dynamic 3D Scene Analysis from a Moving Vehicle,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, 2007.
[40] B. Leibe, E. Seemann, and B. Schiele, “Pedestrian Detection in Crowded Scenes,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, pp. 878-885, 2005.
[41] R. Lienhart and J. Maydt, “An Extended Set of Haar-Like Features for Rapid Object Detection,” Proc. Int'l Conf. Image Processing, pp. 900-903, 2002.
[42] D.G. Lowe, “Distinctive Image Features from Scale Invariant Keypoints,” Int'l J. Computer Vision, vol. 60, no. 2, pp. 91-110, 2004.
[43] J. MacCormick and A. Blake, “Partitioned Sampling, Articulated Objects and Interface-Quality Hand Tracking,” Proc. European Conf. Computer Vision, pp. 3-19, 2000.
[44] J. MacCormick and A. Blake, “A Probabilistic Exclusion Principle for Tracking Multiple Objects,” Int'l J. Computer Vision, vol. 39, no. 1, pp. 57-71, 2000.
[45] K. Mikolajczyk, C. Schmid, and A. Zisserman, “Human Detection Based on a Probabilistic Assembly of Robust Part Detectors,” Proc. European Conf. Computer Vision, pp. 69-81, 2004.
[46] MIT CBCL Pedestrian Database, http://cbcl.mit.edu/cbcl/ software-datasets PedestrianData.html, 2008.
[47] T.B. Moeslund and E. Granum, “A Survey of Advances in Vision-Based Human Motion Capture and Analysis,” Computer Vision and Image Understanding, vol. 103, nos. 2/3, pp. 90-126, 2006.
[48] A. Mohan, C. Papageorgiou, and T. Poggio, “Example-Based Object Detection in Images by Components,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 4, pp. 349-361, Apr. 2001.
[49] S. Munder and D.M. Gavrila, “An Experimental Study on Pedestrian Classification,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 11, pp. 1863-1868, Nov. 2006.
[50] S. Munder, C. Schnörr, and D.M. Gavrila, “Pedestrian Detection and Tracking Using a Mixture of View-Based Shape-Texture Models,” IEEE Trans. Intelligent Transportation Systems, vol. 9, no. 2, pp. 333-343, June 2008.
[51] C. Nakajima, M. Pontil, B. Heisele, and T. Poggio, “Full-Body Recognition System,” Pattern Recognition, vol. 36, pp. 1997-2006, 2003.
[52] K. Okuma, A. Taleghani, N. de Freitas, J. Little, and D. Lowe, “A Boosted Particle Filter: Multitarget Detection and Tracking,” Proc. European Conf. Computer Vision, pp. 28-39, 2004.
[53] C. Papageorgiou and T. Poggio, “A Trainable System for Object Detection,” Int'l J. Computer Vision, vol. 38, pp. 15-33, 2000.
[54] PETS Data sets, http://www.cvg.rdg.ac.uk/slidespets.html , 2007.
[55] V. Philomin, R. Duraiswami, and L.S. Davis, “Quasi-Random Sampling for Condensation,” Proc. European Conf. Computer Vision, pp. 134-149, 2000.
[56] R. Polana and R. Nelson, “Low-Level Recognition of Human Motion,” Proc. IEEE Workshop Motion of Non-Rigid and Articulated Objects, pp. 77-92, 1994.
[57] R. Poppe, “Vision-Based Human Motion Analysis: An Overview,” Computer Vision and Image Understanding, vol. 108, pp. 4-18, 2007.
[58] D. Ramanan, A.D. Forsyth, and A. Zisserman, “Strike a Pose: Tracking People by Finding Stylized Poses,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, pp. 271-278, 2005.
[59] T. Randen and J.H. Husøy, “Filtering for Texture Classification: A Comparative Study,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 21, no. 4, pp. 291-310, Apr. 1999.
[60] P. Sabzmeydani and G. Mori, “Detecting Pedestrians by Learning Shapelet Features,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, 2007.
[61] E. Seemann, M. Fritz, and B. Schiele, “Towards Robust Pedestrian Detection in Crowded Image Sequences,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, 2007.
[62] A. Shashua, Y. Gdalyahu, and G. Hayon, “Pedestrian Detection for Driving Assistance Systems: Single-Frame Classification and System Level Performance,” Proc. IEEE Intelligent Vehicles Symp., pp. 1-6, 2004.
[63] V.D. Shet, J. Neumann, V. Ramesh, and L.S. Davis, “Bilattice-Based Logical Reasoning for Human Detection,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, 2007.
[64] H. Shimizu and T. Poggio, “Direction Estimation of Pedestrian from Multiple Still Images,” Proc. IEEE Intelligent Vehicles Symp., pp. 596-600, 2004.
[65] H. Sidenbladh and M.J. Black, “Learning the Statistics of People in Images and Video,” Int'l J. Computer Vision, vol. 54, nos. 1-3, pp. 183-209, 2003.
[66] M. Spengler and B. Schiele, “Towards Robust Multi-Cue Integration for Visual Tracking,” Machine Vision and Applications, vol. 14, no. 1, pp. 50-58, 2003.
[67] B. Stenger, A. Thayananthan, P.H.S. Torr, and R. Cipolla, “Model-Based Hand Tracking Using a Hierarchical Bayesian Filter,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 9, pp. 1372-1385, Sept. 2006.
[68] M. Szarvas, A. Yoshizawa, M. Yamamoto, and J. Ogata, “Pedestrian Detection with Convolutional Neural Networks,” Proc. IEEE Intelligent Vehicles Symp., pp. 223-228, 2005.
[69] L. Taycher, G. Shakhnarovich, D. Demirdjian, and T. Darrell, “Conditional Random People: Tracking Humans with CRFs and Grid Filters,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, pp. 222-229, 2006.
[70] K. Toyama and A. Blake, “Probabilistic Tracking with Exemplars in a Metric Space,” Int'l J. Computer Vision, vol. 48, no. 1, pp. 9-19, 2002.
[71] O. Tuzel, F. Porikli, and P. Meer, “Human Detection via Classification on Riemannian Manifolds,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, 2007.
[72] I. Ulusoy and C.M. Bishop, “Generative versus Discriminative Methods for Object Recognition,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, pp. 258-265, 2005.
[73] V.N. Vapnik, The Nature of Statistical Learning Theory. Springer, 1995.
[74] P. Viola, M. Jones, and D. Snow, “Detecting Pedestrians Using Patterns of Motion and Appearance,” Int'l J. Computer Vision, vol. 63, no. 2, pp. 153-161, 2005.
[75] C. Wöhler and J. Anlauf, “An Adaptable Time-Delay Neural-Network Algorithm for Image Sequence Analysis,” IEEE Trans. Neural Networks, vol. 10, no. 6, pp. 1531-1536, Nov. 1999.
[76] B. Wu and R. Nevatia, “Detection and Tracking of Multiple, Partially Occluded Humans by Bayesian Combination of Edgelet Based Part Detectors,” Int'l J. Computer Vision, vol. 75, no. 2, pp. 247-266, 2007.
[77] Y. Wu and T. Yu, “A Field Model for Human Detection and Tracking,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 5, pp. 753-765, May 2006.
[78] K. Zapien, J. Fehr, and H. Burkhardt, “Fast Support Vector Machine Classification Using Linear SVMs,” Proc. Int'l Conf. Pattern Recognition, pp. 366-369, 2006.
[79] H. Zhang, A. Berg, M. Maire, and J. Malik, “SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, 2006.
[80] L. Zhang, B. Wu, and R. Nevatia, “Detection and Tracking of Multiple Humans with Extensive Pose Articulation,” Proc. Int'l Conf. Computer Vision, 2007.
[81] L. Zhao and C. Thorpe, “Stereo and Neural Network-Based Pedestrian Detection,” IEEE Trans. Intelligent Transportation Systems, vol. 1, no. 3, pp. 148-154, Sept. 2000.
[82] T. Zhao and R. Nevatia, “Tracking Multiple Humans in Complex Situations,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 9, pp. 1208-1221, Sept. 2004.
[83] Q. Zhu, S. Avidan, M. Yeh, and K. Cheng, “Fast Human Detection Using a Cascade of Histograms of Oriented Gradients,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, pp. 1491-1498, 2006.
631 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool