Subscribe
Issue No.09 - September (2010 vol.32)
pp: 1627-1645
Pedro F. Felzenszwalb , University of Chicago, Chicago
Ross B. Girshick , University of Chicago, Chicago
David McAllester , Toyota Technological Institute at Chicago, Chicago
Deva Ramanan , University of California, Irvine, Irvine
ABSTRACT
We describe an object detection system based on mixtures of multiscale deformable part models. Our system is able to represent highly variable object classes and achieves state-of-the-art results in the PASCAL object detection challenges. While deformable part models have become quite popular, their value had not been demonstrated on difficult benchmarks such as the PASCAL data sets. Our system relies on new methods for discriminative training with partially labeled data. We combine a margin-sensitive approach for data-mining hard negative examples with a formalism we call latent SVM. A latent SVM is a reformulation of MI--SVM in terms of latent variables. A latent SVM is semiconvex, and the training problem becomes convex once latent information is specified for the positive examples. This leads to an iterative training algorithm that alternates between fixing latent values for positive examples and optimizing the latent SVM objective function.
INDEX TERMS
Object recognition, deformable models, pictorial structures, discriminative training, latent SVM.
CITATION
Pedro F. Felzenszwalb, Ross B. Girshick, David McAllester, Deva Ramanan, "Object Detection with Discriminatively Trained Part-Based Models", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.32, no. 9, pp. 1627-1645, September 2010, doi:10.1109/TPAMI.2009.167
REFERENCES
 [1] Y. Amit and A. Kong, "Graphical Templates for Model Registration," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, no. 3, pp. 225-236, Mar. 1996. [2] Y. Amit and A. Trouve, "POP: Patchwork of Parts Models for Object Recognition," Int'l J. Computer Vision, vol. 75, no. 2, pp. 267-282, 2007. [3] S. Andrews, I. Tsochantaridis, and T. Hofmann, "Support Vector Machines for Multiple-Instance Learning," Proc. Advances in Neural Information Processing Systems, 2003. [4] A. Bar-Hillel and D. Weinshall, "Efficient Learning of Relational Object Class Models," Int'l J. Computer Vision, vol. 77, no. 1, pp. 175-198, 2008. [5] E. Bernstein and Y. Amit, "Part-Based Statistical Models for Object Classification and Detection," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2005. [6] M. Burl, M. Weber, and P. Perona, "A Probabilistic Approach to Object Recognition Using Local Photometry and Global Geometry," Proc. European Conf. Computer Vision, 1998. [7] T. Cootes, G. Edwards, and C. Taylor, "Active Appearance Models," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 6, pp. 681-685, June 2001. [8] J. Coughlan, A. Yuille, C. English, and D. Snow, "Efficient Deformable Template Detection and Localization without User Initialization," Computer Vision and Image Understanding, vol. 78, no. 3, pp. 303-319, June 2000. [9] D. Crandall, P. Felzenszwalb, and D. Huttenlocher, "Spatial Priors for Part-Based Recognition Using Statistical Models," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2005. [10] N. Dalal and B. Triggs, "Histograms of Oriented Gradients for Human Detection," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2005. [11] M. Everingham, L. Van Gool, C.K.I. Williams, J. Winn, and A. Zisserman, "The PASCAL Visual Object Classes Challenge 2007 (VOC 2007) Results," http://www.pascal-network.org/challenges/ VOCvoc2007/, 2007. [12] M. Everingham, L. Van Gool, C.K.I. Williams, J. Winn, and A. Zisserman, "The PASCAL Visual Object Classes Challenge 2008 (VOC 2008) Results," http://www.pascal-network.org/challenges/ VOCvoc2008/, 2008. [13] M. Everingham, A. Zisserman, C.K.I. Williams, and L. Van Gool, "The PASCAL Visual Object Classes Challenge 2006 (VOC 2006) Results," http://www.pascal-network.org/challenges/ VOCvoc2006/, 2006. [14] P. Felzenszwalb and D. Huttenlocher, "Distance Transforms of Sampled Functions," Technical Report 2004-1963, Cornell Univ. CIS, 2004. [15] P. Felzenszwalb and D. Huttenlocher, "Pictorial Structures for Object Recognition," Int'l J. Computer Vision, vol. 61, no. 1, pp. 55-79, 2005. [16] P. Felzenszwalb and D. McAllester, "The Generalized ${\rm A}^{\ast}$ Architecture," J. Artificial Intelligence Research, vol. 29, pp. 153-190, 2007. [17] P. Felzenszwalb, D. McAllester, and D. Ramanan, "A Discriminatively Trained, Multiscale, Deformable Part Model," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008. [18] R. Fergus, P. Perona, and A. Zisserman, "Object Class Recognition by Unsupervised Scale-Invariant Learning," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2003. [19] R. Fergus, P. Perona, and A. Zisserman, "A Sparse Object Category Model for Efficient Learning and Exhaustive Recognition," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2005. [20] M. Fischler and R. Elschlager, "The Representation and Matching of Pictorial Structures," IEEE Trans. Computers, vol. 22, no. 1, pp. 67-92, Jan. 1973. [21] U. Grenander, Y. Chow, and D. Keenan, HANDS: A Pattern-Theoretic Study of Biological Shapes. Springer-Verlag, 1991. [22] D. Hoiem, A. Efros, and M. Hebert, "Putting Objects in Perspective," Int'l J. Computer Vision, vol. 80, no. 1, pp. 3-15, Oct. 2008. [23] A. Holub and P. Perona, "A Discriminative Framework for Modelling Object Classes," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2005. [24] Y. Jin and S. Geman, "Context and Hierarchy in a Probabilistic Image Model," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006. [25] T. Joachims, "Making Large-Scale SVM Learning Practical," Advances in Kernel Methods—Support Vector Learning, B. Schölkopf, C. Burges, and A. Smola, eds., MIT Press, 1999. [26] Y. Ke and R. Sukthankar, "PCA-SIFT: A More Distinctive Representation for Local Image Descriptors," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2004. [27] Y. LeCun, S. Chopra, R. Hadsell, R. Marc'Aurelio, and F. Huang, "A Tutorial on Energy-Based Learning," Predicting Structured Data, G. Bakir, T. Hofman, B. Schölkopf, A. Smola, and B. Taskar, eds. MIT Press, 2006. [28] B. Leibe, A. Leonardis, and B. Schiele, "Robust Object Detection with Interleaved Categorization and Segmentation," Int'l J. Computer Vision, vol. 77, no. 1, pp. 259-289, 2008. [29] D. Lowe, "Distinctive Image Features from Scale-Invariant Keypoints," Int'l J. Computer Vision, vol. 60, no. 2, pp. 91-110, Nov. 2004. [30] C. Papageorgiou, M. Oren, and T. Poggio, "A General Framework for Object Detection," Proc. IEEE Int'l Conf. Computer Vision, 1998. [31] W. Plantinga and C. Dyer, "An Algorithm for Constructing the Aspect Graph," Proc. 27th Ann. Symp. Foundations of Computer Science, 1985, pp. 123-131, 1986. [32] A. Quattoni, S. Wang, L. Morency, M. Collins, and T. Darrell, "Hidden Conditional Random Fields," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 10, pp. 1848-1852, Oct. 2007. [33] A. Rabinovich, A. Vedaldi, C. Galleguillos, E. Wiewiora, and S. Belongie, "Objects in Context," Proc. IEEE Int'l Conf. Computer Vision, 2007. [34] D. Ramanan and C. Sminchisescu, "Training Deformable Models for Localization," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006. [35] H. Rowley, S. Baluja, and T. Kanade, "Human Face Detection in Visual Scenes," Technical Report CMU-CS-95-158R, Carnegie Mellon Univ., 1995. [36] H. Schneiderman and T. Kanade, "A Statistical Method for 3d Object Detection Applied to Faces and Cars," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2000. [37] S. Shalev-Shwartz, Y. Singer, and N. Srebro, "Pegasos: Primal Estimated Sub-Gradient Solver for SVM," Proc. Int'l Conf. Machine Learning, 2007. [38] K. Sung and T. Poggio, "Example-Based Learning for View-Based Human Face Detection," Technical Report A.I. Memo No. 1521, Massachusetts Inst. of Tech nology, 1994. [39] A. Torralba, "Contextual Priming for Object Detection," Int'l J. Computer Vision, vol. 53, no. 2, pp. 169-191, July 2003. [40] P. Viola, J. Platt, and C. Zhang, "Multiple Instance Boosting for Object Detection," Proc. Advances in Neural Information Processing Systems, 2005. [41] P. Viola and M. Jones, "Robust Real-Time Face Detection," Int'l J. Computer Vision, vol. 57, no. 2, pp. 137-154, May 2004. [42] M. Weber, M. Welling, and P. Perona, "Towards Automatic Discovery of Object Categories," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2000. [43] A. Yuille, P. Hallinan, and D. Cohen, "Feature Extraction from Faces Using Deformable Templates," Int'l J. Computer Vision, vol. 8, no. 2, pp. 99-111, 1992. [44] J. Zhang, M. Marszalek, S. Lazebnik, and C. Schmid, "Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study," Int'l J. Computer Vision, vol. 73, no. 2, pp. 213-238, June 2007. [45] S. Zhu and D. Mumford, "A Stochastic Grammar of Images," Foundations and Trends in Computer Graphics and Vision, vol. 2, no. 4, pp. 259-362, 2007.