This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Pedestrian Detection: An Evaluation of the State of the Art
April 2012 (vol. 34 no. 4)
pp. 743-761
C. Wojek, Max Planck Inst. for Inf., Saarbrucken, Germany
P. Dollar, Dept. of Electr. Eng., California Inst. of Technol., Pasadena, CA, USA
B. Schiele, Max Planck Inst. for Inf., Saarbrucken, Germany
P. Perona, Dept. of Electr. Eng., California Inst. of Technol., Pasadena, CA, USA
Pedestrian detection is a key problem in computer vision, with several applications that have the potential to positively impact quality of life. In recent years, the number of approaches to detecting pedestrians in monocular images has grown steadily. However, multiple data sets and widely varying evaluation protocols are used, making direct comparisons difficult. To address these shortcomings, we perform an extensive evaluation of the state of the art in a unified framework. We make three primary contributions: 1) We put together a large, well-annotated, and realistic monocular pedestrian detection data set and study the statistics of the size, position, and occlusion patterns of pedestrians in urban scenes, 2) we propose a refined per-frame evaluation methodology that allows us to carry out probing and informative comparisons, including measuring performance in relation to scale and occlusion, and 3) we evaluate the performance of sixteen pretrained state-of-the-art detectors across six data sets. Our study allows us to assess the state of the art and provides a framework for gauging future efforts. Our experiments show that despite significant progress, performance still has much room for improvement. In particular, detection is disappointing at low resolutions and for partially occluded pedestrians.

[1] U. Shankar, "Pedestrian Roadway Fatalities," technical report, Dept. of Transportation, 2003.
[2] D. Geronimo, A.M. Lopez, A.D. Sappa, and T. Graf, "Survey on Pedestrian Detection for Advanced Driver Assistance Systems," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 32, no. 7, pp. 1239-1258, July 2010.
[3] P. Dollár, C. Wojek, B. Schiele, and P. Perona, "Pedestrian Detection: A Benchmark," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[4] A. Ess, B. Leibe, and L. Van Gool, "Depth and Appearance for Mobile Scene Analysis," Proc. IEEE Int'l Conf. Computer Vision, 2007.
[5] C. Wojek, S. Walk, and B. Schiele, "Multi-Cue Onboard Pedestrian Detection," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[6] M. Enzweiler and D.M. Gavrila, "Monocular Pedestrian Detection: Survey and Experiments," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 12, pp. 2179-2195, Dec. 2009.
[7] N. Dalal and B. Triggs, "Histograms of Oriented Gradients for Human Detection," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2005.
[8] J.L. Barron, D.J. Fleet, S.S. Beauchemin, and T.A. Burkitt, "Performance of Optical Flow Techniques," Int'l J. Computer Vision, vol. 12, no. 1, pp. 43-77, 1994.
[9] S. Baker, D. Scharstein, J. Lewis, S. Roth, M. Black, and R. Szeliski, "A Database and Evaluation Methodology for Optical Flow," Proc. IEEE Int'l Conf. Computer Vision, 2007.
[10] D. Martin, C. Fowlkes, and J. Malik, "Learning to Detect Natural Image Boundaries Using Local Brightness, Color, and Texture Cues," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 5, pp. 530-549, May 2004.
[11] D. Scharstein and R. Szeliski, "A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms," Int'l J. Computer Vision, vol. 47, pp. 7-42, 2002.
[12] L. Fei-Fei, R. Fergus, and P. Perona, "One-Shot Learning of Object Categories," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 4, pp. 594-611, Apr. 2006.
[13] G. Griffin, A. Holub, and P. Perona, "Caltech-256 Object Category Data Set," Technical Report 7694, California Inst. of Tech nology, 2007.
[14] M. Everingham, L. Van Gool, C.K.I. Williams, J. Winn, and A. Zisserman, "The PASCAL Visual Object Classes (VOC) Challenge," Int'l J. Computer Vision, vol. 88, no. 2, pp. 303-338, June 2010.
[15] S. Baker and I. Matthews, "Lucas-Kanade 20 Years On: A Unifying Framework," Int'l J. Computer Vision, vol. 56, no. 3, pp. 221-255, 2004.
[16] C. Papageorgiou and T. Poggio, "A Trainable System for Object Detection," Int'l J. Computer Vision, vol. 38, no. 1, pp. 15-33, 2000.
[17] B. Wu and R. Nevatia, "Detection of Multiple, Partially Occluded Humans in a Single Image by Bayesian Combination of Edgelet Part Detectors," Proc. 10th IEEE Int'l Conf. Computer Vision, 2005.
[18] B. Wu and R. Nevatia, "Cluster Boosted Tree Classifier for Multi-View, Multi-Pose Object Detection," Proc. 11th IEEE Int'l Conf. Computer Vision, 2007.
[19] D. Gerónimo, A. Sappa, A. López, and D. Ponsa, "Adaptive Image Sampling and Windows Classification for On-Board Pedestrian Detection," Proc. Int'l Conf. Computer Vision Systems, 2005.
[20] M. Andriluka, S. Roth, and B. Schiele, "People-Tracking-by-Detection and People-Detection-by-Tracking," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[21] S. Munder and D.M. Gavrila, "An Experimental Study on Pedestrian Classification," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 11, pp. 1863-1868, Nov. 2006.
[22] G. Overett, L. Petersson, N. Brewer, L. Andersson, and N. Pettersson, "A New Pedestrian Data Set for Supervised Learning," Proc. IEEE Intelligent Vehicles Symp., 2008.
[23] B. Russell, A. Torralba, K.P. Murphy, and W.T. Freeman, "LabelMe: A Database and Web-Based Tool for Image Annotation," Int'l J. Computer Vision, vol. 77, nos. 1-3, pp. 157-173, 2008.
[24] A.T. Nghiem, F. Bremond, M. Thonnat, and V. Valentin, "ETISEO, Performance Evaluation for Video Surveillance Systems," Proc. IEEE Int'l Conf. Advanced Video and Signal Based Surveillance, 2007.
[25] E. Seemann, M. Fritz, and B. Schiele, "Towards Robust Pedestrian Detection in Crowded Image Sequences," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007.
[26] G. Salton and M.J. McGill, Introduction to Modern Information Retrieval. McGraw-Hill, Inc., 1986.
[27] M. Hussein, F. Porikli, and L. Davis, "A Comprehensive Evaluation Framework and a Comparative Study for Human Detectors," IEEE Trans. Intelligent Transportation Systems, vol. 10, no. 3, pp. 417-427, Sept. 2009.
[28] S. Walk, N. Majer, K. Schindler, and B. Schiele, "New Features and Insights for Pedestrian Detection," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010.
[29] P. Dollár, Z. Tu, P. Perona, and S. Belongie, "Integral Channel Features," Proc. British Machine Vision Conf., 2009.
[30] D.M. Gavrila and S. Munder, "Multi-Cue Pedestrian Detection and Tracking from a Moving Vehicle," Int'l J. Computer Vision, vol. 73, pp. 41-59, 2007.
[31] B. Leibe, A. Leonardis, and B. Schiele, "Robust Object Detection with Interleaved Categorization and Segmentation," Int'l J. Computer Vision, vol. 77, nos. 1-3, pp. 259-289, May 2008.
[32] C.H. Lampert, M.B. Blaschko, and T. Hofmann, "Beyond Sliding Windows: Object Localization by Effcient Subwindow Search," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[33] P. Sabzmeydani and G. Mori, "Detecting Pedestrians by Learning Shapelet Features," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007.
[34] S. Maji, A. Berg, and J. Malik, "Classification Using Intersection Kernel Svms Is Efficient," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[35] C. Gu, J.J. Lim, P. Arbelaez, and J. Malik, "Recognition Using Regions," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[36] B. Leibe, E. Seemann, and B. Schiele, "Pedestrian Detection in Crowded Scenes," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2005.
[37] E. Seemann, B. Leibe, K. Mikolajczyk, and B. Schiele, "An Evaluation of Local Shape-Based Features for Pedestrian Detection," Proc. British Machine Vision Conf., 2005.
[38] I. Alonso, D. Llorca, M. Sotelo, L. Bergasa, P.R. de Toro, J. Nuevo, M. Ocana, and M. Garrido, "Combination of Feature Extraction Methods for SVM Pedestrian Detection," IEEE Trans. Intelligent Transportation Systems, vol. 8, no. 2, pp. 292-307, June 2007.
[39] M. Bajracharya, B. Moghaddam, A. Howard, S. Brennan, and L.H. Matthies, "A Fast Stereo-Based System for Detecting and Tracking Pedestrians from a Moving Vehicle," The Int'l J. Robotics Research, vol. 28, pp. 1466-1485, 2009.
[40] A. Ess, B. Leibe, K. Schindler, and L. Van Gool, "Robust Multi-Person Tracking from a Mobile Platform," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 10, pp. 1831-1846, Oct. 2009.
[41] C. Wojek, S. Roth, K. Schindler, and B. Schiele, "Monocular 3D Scene Modeling and Inference: Understanding Multi-Object Traffic Scenes," Proc. European Conf. Computer Vision, 2010.
[42] E. Dickmanns, Dynamic Vision for Perception and Control of Motion. Springer, 2007.
[43] T. Gandhi and M. Trivedi, "Pedestrian Protection Systems: Issues, Survey, and Challenges," IEEE Trans. Intelligent Transportation Systems, vol. 8, no. 3, pp. 413-430, Sept. 2007.
[44] P.A. Viola and M.J. Jones, "Robust Real-Time Face Detection," Int'l J. Computer Vision, vol. 57, no. 2, pp. 137-154, 2004.
[45] D.G. Lowe, "Distinctive Image Features from Scale-Invariant Keypoints," Int'l J. Computer Vision, vol. 60, no. 2, pp. 91-110, 2004.
[46] Q. Zhu, S. Avidan, M. Yeh, and K. Cheng, "Fast Human Detection Using a Cascade of Histograms of Oriented Gradients," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006.
[47] F.M. Porikli, "Integral Histogram: A Fast Way to Extract Histograms in Cartesian Spaces," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2005.
[48] A. Shashua, Y. Gdalyahu, and G. Hayun, "Pedestrian Detection for Driving Assistance Systems: Single-Frame Classification and System Level Performance," Proc. IEEE Int'l Conf. Intelligent Vehicles, 2004.
[49] D.M. Gavrila and V. Philomin, "Real-Time Object Detection for Smart Vehicles," Proc. IEEE Int'l Conf. Computer Vision, pp. 87-93, 1999.
[50] D.M. Gavrila, "A Bayesian, Exemplar-Based Approach to Hierarchical Shape Matching," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 8, pp. 1408-1421, Aug. 2007.
[51] Y. Liu, S. Shan, W. Zhang, X. Chen, and W. Gao, "Granularity-Tunable Gradients Partition Descriptors for Human Detection," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[52] Y. Liu, S. Shan, X. Chen, J. Heikkila, W. Gao, and M. Pietikainen, "Spatial-Temporal Granularity-Tunable Gradients Partition Descriptors for Human Detection," Proc. European Conf. Computer Vision, 2010.
[53] P.A. Viola, M.J. Jones, and D. Snow, "Detecting Pedestrians Using Patterns of Motion and Appearance," Int'l J. Computer Vision, vol. 63, no. 2, pp. 153-161, 2005.
[54] N. Dalal, B. Triggs, and C. Schmid, "Human Detection Using Oriented Histograms of Flow and Appearance," Proc. European Conf. Computer Vision, 2006.
[55] N. Dalal, "Finding People in Images and Videos," PhD dissertation, Institut Nat. Polytechnique de Gre noble, July 2006.
[56] C. Wojek and B. Schiele, "A Performance Evaluation of Single and Multi-Feature People Detection," Proc. DAGM Symp. Pattern Recognition, 2008.
[57] G. Mori, S. Belongie, and J. Malik, "Efficient Shape Matching Using Shape Contexts," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 11, pp. 1832-1837, Nov. 2005.
[58] B. Wu and R. Nevatia, "Optimizing Discrimination-Efficiency Tradeoff in Integrating Heterogeneous Local Features for Object Detection," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[59] X. Wang, T.X. Han, and S. Yan, "An HOG-LBP Human Detector with Partial Occlusion Handling," Proc. IEEE Int'l Conf. Computer Vision, 2009.
[60] T. Ojala, M. Pietikainen, and T. Maenpaa, "Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 7, pp. 971-987, July 2002.
[61] S. Hussain and B. Triggs, "Feature Sets and Dimensionality Reduction for Visual Object Detection," Proc. British Machine Vision Conf., 2010.
[62] P. Ott and M. Everingham, "Implicit Color Segmentation Features for Pedestrian and Object Detection," Proc. IEEE Int'l Conf. Computer Vision, 2009.
[63] P. Dollár, S. Belongie, and P. Perona, "The Fastest Pedestrian Detector in the West," Proc. British Machine Vision Conf., 2010.
[64] O. Tuzel, F. Porikli, and P. Meer, "Pedestrian Detection via Classification on Riemannian Manifolds," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 10, pp. 1713-1727, Oct. 2008.
[65] B. Babenko, P. Dollár, Z. Tu, and S. Belongie, "Simultaneous Learning and Alignment: Multi-Instance and Multi-Pose Learning," Proc. ECCV Faces in Real-Life Images, 2008.
[66] S. Walk, K. Schindler, and B. Schiele, "Disparity Statistics for Pedestrian Detection: Combining Appearance, Motion and Stereo," Proc. European Conf. Computer Vision, 2010.
[67] P. Dollár, Z. Tu, H. Tao, and S. Belongie, "Feature Mining for Image Classification," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007.
[68] A. Bar-Hillel, D. Levi, E. Krupka, and C. Goldberg, "Part-Based Feature Synthesis for Human Detection," Proc. European Conf. Computer Vision, 2010.
[69] W. Schwartz, A. Kembhavi, D. Harwood, and L. Davis, "Human Detection Using Partial Least Squares Analysis," Proc. IEEE Int'l Conf. Computer Vision, 2009.
[70] Z. Lin and L.S. Davis, "A Pose-Invariant Descriptor for Human Detection and Segmentation," Proc. European Conf. Computer Vision, 2008.
[71] P. Felzenszwalb, D. McAllester, and D. Ramanan, "A Discriminatively Trained, Multiscale, Deformable Part Model," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[72] P.F. Felzenszwalb, R.B. Girshick, D. McAllester, and D. Ramanan, "Object Detection with Discriminatively Trained Part Based Models," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 32, no. 9, pp. 1627-1645, Sept. 2010.
[73] A. Mohan, C. Papageorgiou, and T. Poggio, "Example-Based Object Detection in Images by Components," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 4, pp. 349-361, Apr. 2001.
[74] K. Mikolajczyk, C. Schmid, and A. Zisserman, "Human Detection Based on a Probabilistic Assembly of Robust Part Detectors," Proc. European Conf. Computer Vision, 2004.
[75] M. Enzweiler, A. Eigenstetter, B. Schiele, and D.M. Gavrila, "Multi-Cue Pedestrian Classification with Partial Occlusion Handling," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010.
[76] L. Bourdev and J. Malik, "Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations," Proc. IEEE Int'l Conf. Computer Vision, 2009.
[77] M. Enzweiler and D.M. Gavrila, "Integrated Pedestrian Classification and Orientation Estimation," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010.
[78] D. Tran and D. Forsyth, "Configuration Estimates Improve Pedestrian Finding," Proc. Advances in Neural Information Processing Systems, 2008.
[79] M. Weber, M. Welling, and P. Perona, "Unsupervised Learning of Models for Recognition" Proc. European Conf. Computer Vision, 2000.
[80] R. Fergus, P. Perona, and A. Zisserman, "Object Class Recognition by Unsupervised Scale-Invariant Learning," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2003.
[81] S. Agarwal and D. Roth, "Learning a Sparse Representation for Object Detection," Proc. European Conf. Computer Vision, 2002.
[82] P. Dollár, B. Babenko, S. Belongie, P. Perona, and Z. Tu, "Multiple Component Learning for Object Detection," Proc. European Conf. Computer Vision, 2008.
[83] Z. Lin, G. Hua, and L.S. Davis, "Multiple Instance Feature for Robust Part-Based Object Detection," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[84] D. Park, D. Ramanan, and C. Fowlkes, "Multiresolution Models for Object Detection," Proc. European Conf. Computer Vision, 2010.
[85] R.M. Haralick, K. Shanmugam, and I. Dinstein, "Textural Features for Image Classification," IEEE Trans. Systems, Man, and Cybernetics, vol. 3, no. 6, pp. 610-621, 1973.
[86] E. Shechtman and M. Irani, "Matching Local Self-Similarities across Images and Videos," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007.
[87] J. Demšar, "Statistical Comparisons of Classifiers over Multiple Data Sets," J. Machine Learning Research, vol. 7, pp. 1-30, 2006.
[88] S. García and F. Herrera, "An Extension on 'Statistical Comparisons of Classifiers over Multiple Data Sets' for All Pairwise Comparisons," J. Machine Learning Research, vol. 9, pp. 2677-2694, 2008.
[89] W. Zhang, G.J. Zelinsky, and D. Samaras, "Real-Time Accurate Object Detection Using Multiple Resolutions," Proc. IEEE Int'l Conf. Computer Vision, 2007.
[90] C. Wojek, G. Dorkó, A. Schulz, and B. Schiele, "Sliding-Windows for Rapid Object Class Localization: A Parallel Technique," Proc. DAGM Symp. Pattern Recognition, 2008.

Index Terms:
traffic engineering computing,computer vision,object detection,partially occluded pedestrian,pedestrian detection,computer vision,quality of life,monocular image,urban scene,state-of-the-art detector,Detectors,Pixel,Cameras,Training,Testing,Heating,Labeling,Caltech Pedestrian data set.,Pedestrian detection,object detection,benchmark,evaluation,data set
Citation:
C. Wojek, P. Dollar, B. Schiele, P. Perona, "Pedestrian Detection: An Evaluation of the State of the Art," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 4, pp. 743-761, April 2012, doi:10.1109/TPAMI.2011.155
Usage of this product signifies your acceptance of the Terms of Use.