The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.11 - November (2011 vol.33)
pp: 2188-2202
Juergen Gall , ETH Zurich, Zurich
Angela Yao , ETH Zurich, Zurich
Nima Razavi , ETH Zurich, Zurich
Luc Van Gool , ETH Zurich, Zurich and IBBT, K.U., Leuven
Victor Lempitsky , University of Oxford, Oxford
ABSTRACT
The paper introduces Hough forests, which are random forests adapted to perform a generalized Hough transform in an efficient way. Compared to previous Hough-based systems such as implicit shape models, Hough forests improve the performance of the generalized Hough transform for object detection on a categorical level. At the same time, their flexibility permits extensions of the Hough transform to new domains such as object tracking and action recognition. Hough forests can be regarded as task-adapted codebooks of local appearance that allow fast supervised training and fast matching at test time. They achieve high detection accuracy since the entries of such codebooks are optimized to cast Hough votes with small variance and since their efficiency permits dense sampling of local image patches or video cuboids during detection. The efficacy of Hough forests for a set of computer vision tasks is validated through experiments on a large set of publicly available benchmark data sets and comparisons with the state-of-the-art.
INDEX TERMS
Hough transform, object detection, tracking, action recognition.
CITATION
Juergen Gall, Angela Yao, Nima Razavi, Luc Van Gool, Victor Lempitsky, "Hough Forests for Object Detection, Tracking, and Action Recognition", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.33, no. 11, pp. 2188-2202, November 2011, doi:10.1109/TPAMI.2011.70
REFERENCES
[1] H. Schneiderman and T. Kanade, "Object Detection Using the Statistics of Parts," Int'l J. Computer Vision, vol. 56, no. 3, pp. 151-177, 2004.
[2] P. Viola and M. Jones, "Robust Real-Time Face Detection," Int'l J. Computer Vision, vol. 57, no. 2, pp. 137-154, 2004.
[3] N. Dalal and B. Triggs, "Histograms of Oriented Gradients for Human Detection," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 886-893, 2005.
[4] V. Ferrari, F. Jurie, and C. Schmid, "Accurate Object Detection with Deformable Shape Models Learned from Images," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007.
[5] S. Maji, A. Berg, and J. Malik, "Classification Using Intersection Kernel Support Vector Machines Is Efficient," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[6] P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan, "Object Detection with Discriminatively Trained Part Based Models," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 32, no. 9, pp. 1627-1645, Sept. 2010.
[7] P. Schnitzspan, M. Fritz, S. Roth, and B. Schiele, "Discriminative Structure Learning of Hierarchical Representations for Object Detection," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[8] C. Lampert, M. Blaschko, and T. Hofmann, "Efficient Subwindow Search: A Branch and Bound Framework for Object Localization," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 12, pp. 2129-2142, Dec. 2009.
[9] P. Felzenszwalb, R. Girshick, and D. McAllester, "Cascade Object Detection with Deformable Part Models," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010.
[10] P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie, "Behavior Recognition via Sparse Spatio-Temporal Features," Proc. IEEE Second Joint Int'l Workshop Visual Surveillance and Performance Evaluation of Tracking and Surveillance, 2005.
[11] J. Liu, J. Luo, and M. Shah, "Recognizing Realistic Actions from Videos in the Wild," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[12] I. Laptev, M. Marszałek, C. Schmid, and B. Rozenfeld, "Learning Realistic Human Actions from Movies," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[13] J. Niebles, H. Wang, and L. Fei-Fei, "Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words," Int'l J. Computer Vision, vol. 79, no. 3, pp. 299-318, 2008.
[14] H. Grabner, C. Leistner, and H. Bischof, "Semi-Supervised On-Line Boosting for Robust Tracking," Proc. European Conf. Computer Vision, 2008.
[15] B. Babenko, M.-H. Yang, and S. Belongie, "Visual Tracking with Online Multiple Instance Learning," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[16] J. Sivic and A. Zisserman, "Video Google: A Text Retrieval Approach to Object Matching in Videos," Proc. Int'l Conf. Computer Vision, pp. 1470-1477, 2003.
[17] G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray, "Visual Categorization with Bags of Keypoints," Proc. Workshop Statistical Learning in Computer Vision, pp. 1-22, 2004.
[18] R. Fergus, L. Fei-Fei, P. Perona, and A. Zisserman, "Learning Object Categories from Google's Image Search," Proc. IEEE Int'l Conf. Computer Vision, pp. 1816-1823, 2005.
[19] R. Duda and P. Hart, "Use of the Hough Transformation to Detect Lines and Curves in Pictures," Comm. ACM, vol. 15, no. 1, pp. 11-15, 1972.
[20] D. Ballard, "Generalizing the Hough Transform to Detect Arbitrary Shapes," Pattern Recognition, vol. 13, no. 2, pp. 111-122, 1981.
[21] B. Leibe, A. Leonardis, and B. Schiele, "Robust Object Detection with Interleaved Categorization and Segmentation," Int'l J. Computer Vision, vol. 77, nos. 1-3, pp. 259-289, 2008.
[22] J. Liebelt, C. Schmid, and K. Schertler, "Viewpoint-Independent Object Class Detection Using 3D Feature Maps," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[23] S. Maji and J. Malik, "Object Detection Using a Max-Margin Hough Transform," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[24] A. Opelt, A. Pinz, and A. Zisserman, "Learning an Alphabet of Shape and Appearance for Multi-Class Object Detection," Int'l J. Computer Vision, vol. 80, no. 1, pp. 16-44, 2008.
[25] B. Ommer and J. Malik, "Multi-Scale Object Detection by Clustering Lines," Proc. IEEE Int'l Conf. Computer Vision, 2009.
[26] A. Lehmann, B. Leibe, and L. Van Gool, "Fast Prism: Branch and Bound Hough Transform for Object Class Detection," Int'l J. Computer Vision, 2010.
[27] B. Leibe and B. Schiele, "Interleaved Object Categorization and Segmentation," Proc. British Machine Vision Conf., pp. 759-768, 2003.
[28] P. Yarlagadda, A. Monroy, and B. Ommer, "Voting by Grouping Dependent Parts," Proc. European Conf. Computer Vision, 2010.
[29] Y. Amit and D. Geman, "Shape Quantization and Recognition with Randomized Trees," Neural Computation, vol. 9, no. 7, pp. 1545-1588, 1997.
[30] L. Breiman, "Random Forests," Machine Learning, vol. 45, no. 1, pp. 5-32, 2001.
[31] F. Jurie and B. Triggs, "Creating Efficient Codebooks for Visual Recognition," Proc. IEEE Int'l Conf. Computer Vision, pp. 604-610, 2005.
[32] H. Wang, M. Ullah, A. Kläser, I. Laptev, and C. Schmid, "Evaluation of Local Spatio-Temporal Features for Action Recognition," Proc. British Machine Vision Conf., 2009.
[33] H. Chen, T. Liu, and C. Fuh, "Segmenting Highly Articulated Video Objects with Weak-Prior Randomforests," Proc. European Conf. Computer Vision, pp. 373-385, 2006.
[34] J. Santner, C. Leistner, A. Saffari, T. Pock, and H. Bischof, "PROST Parallel Robust Online Simple Tracking," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010.
[35] J. Gall and V. Lempitsky, "Class-Specific Hough Forests for Object Detection," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[36] J. Gall, N. Razavi, and L. Van Gool, "On-Line Adaption of Class-Specific Codebooks for Instance Tracking," Proc. British Machine Vision Conf., 2010.
[37] A. Yao, J. Gall, and L. Van Gool, "A Hough Transform-Based Voting Framework for Action Recognition," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010.
[38] B. Leibe, N. Cornelis, K. Cornelis, and L. Van Gool, "Dynamic 3D Scene Analysis from a Moving Vehicle," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007.
[39] J. Shotton, A. Blake, and R. Cipolla, "Multiscale Categorical Object Recognition Using Contour Fragments," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 7, pp. 1270-1281, July 2008.
[40] R. Okada, "Discriminative Generalized Hough Transform for Object Dectection," Proc. Int'l Conf. Computer Vision, 2009.
[41] M. Everingham, L. Van Gool, C. Williams, J. Winn, and A. Zisserman, "The PASCAL Visual Object Classes Challenge 2007," http://www.pascal-network.org/challenges/ VOC/voc2007/workshopindex.html, 2011.
[42] E. Seemann, B. Leibe, and B. Schiele, "Multi-Aspect Detection of Articulated Objects," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006.
[43] A. Thomas, V. Ferrari, B. Leibe, T. Tuytelaars, B. Schiele, and L. Van Gool, "Towards Multi-View Object Class Detection," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006.
[44] O. Barinova, V. Lempitsky, and P. Kohli, "On the Detection of Multiple Object Instances Using Hough Transforms," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010.
[45] R. Marée, P. Geurts, J. Piater, and L. Wehenkel, "Random Subwindows for Robust Image Classification," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 34-40, 2005.
[46] F. Moosmann, B. Triggs, and F. Jurie, "Fast Discriminative Visual Codebooks Using Randomized Clustering Forests," Proc. Neural Information Processing Systems, 2006.
[47] F. Schroff, A. Criminisi, and A. Zisserman, "Object Class Segmentation Using Random Forests," Proc. British Machine Vision Conf., 2008.
[48] J. Shotton, M. Johnson, and R. Cipolla, "Semantic Texton Forests for Image Categorization and Segmentation," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[49] J. Winn and J. Shotton, "The Layout Consistent Random Field for Recognizing and Segmenting Partially Occluded Objects," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 37-44, 2006.
[50] V. Lepetit, P. Lagger, and P. Fua, "Randomized Trees for Real-Time Keypoint Recognition," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 775-781, 2005.
[51] A. Adam, E. Rivlin, and I. Shimshoni, "Robust Fragments-Based Tracking Using the Integral Histogram," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 798-805, 2006.
[52] I. Laptev and T. Lindeberg, "Space-Time Interest Points," Proc. IEEE Int'l Conf. Computer Vision, 2003.
[53] G. Willems, J. Becker, T. Tuytelaars, and L. Van Gool, "Exemplar-Based Action Recognition in Video," Proc. British Machine Vision Conf., 2009.
[54] K. Rapantzikos, Y. Avrithis, and S. Kollias, "Dense Saliency-Based Spatiotemporal Feature Points for Action Recognition," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[55] L. Cao, Z. Liu, and T. Huang, "Cross-Dataset Action Detection," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010.
[56] P. Matikainen, M. Hebert, and R. Sukthankar, "Representing Pairwise Spatial and Temporal Relations for Action Recognition," Proc. European Conf. Computer Vision, 2010.
[57] A. Kovashka and K. Grauman, "Learning a Hierarchy of Discriminative Space-Time Neighborhood Features for Human Action Recognition," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010.
[58] Z. Lin, Z. Jian, and L. Davis, "Recognizing Actions by Shape-Motion Prototype Trees," Proc. IEEE Int'l Conf. Computer Vision, 2009.
[59] K. Mikolajczyk and H. Uemura, "Action Recognition with Motion-Appearance Vocabulary Forest," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[60] K. Reddy, J. Liu, and M. Shah, "Incremental Action Recognition Using Feature-Tree," Proc. IEEE Int'l Conf. Computer Vision, 2009.
[61] S. Agarwal, A. Awan, and D. Roth, "Learning to Detect Objects in Images via a Sparse, Part-Based Representation," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 11, pp. 1475-1490, Nov. 2004.
[62] C. Lampert, M. Blaschko, and T. Hofmann, "Beyond Sliding Windows: Object Localization by Efficient Subwindow Search," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[63] J. Mutch and D. Lowe, "Multiclass Object Recognition with Sparse, Localized Features," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 11-18, 2006.
[64] L. Karlinsky, M. Dinerstein, H. Daniel, and S. Ullman, "The Chains Model for Detecting Parts by Their Context," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010.
[65] M. Andriluka, S. Roth, and B. Schiele, "People-Tracking-by-Detection and People-Detection-by-Tracking." Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[66] E. Borenstein and S. Ullman, "Class-Specific, Top-Down Segmentation," Proc. European Conf. Computer Vision, pp. 639-641, 2002.
[67] M. Andriluka, S. Roth, and B. Schiele, "Pictorial Structures Revisited: People Detection and Articulated Pose Estimation," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[68] E. Seemann and B. Schiele, "Cross-Articulation Learning for Robust Detection of Pedestrians," Proc. Symp. Pattern Recognition, pp. 242-252, 2006.
[69] J. Shotton, A. Blake, and R. Cipolla, "Efficiently Combining Contour and Texture Cues for Object Recognition," Proc. British Machine Vision Conf., 2008.
[70] L. Zhu, Y. Chen, A. Torralba, W. Freeman, and A. Yuille, "Part and Appearance Sharing: Recursive Compositional Models for Multi-View Multi-Object Detection," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010.
[71] N. Razavi, J. Gall, and L. Van Gool, "Backprojection Revisited: Scalable Multi-View Object Detection and Similarity Metrics for Detections," Proc. European Conf. Computer Vision, 2010.
[72] M. Isard and A. Blake, "Contour Tracking by Stochastic Propagation of Conditional Density," Proc. European Conf. Computer Vision, pp. 343-356, 1996.
[73] Sequential Monte Carlo Methods in Practice, A. Doucet, N. De Freitas, and N. Gordon, eds. Birkhäuser, 2001.
[74] H. Grabner, M. Grabner, and H. Bischof, "Real-Time Tracking via On-Line Boosting," Proc. British Machine Vision Conf., pp. 47-56, 2006.
[75] "Imagery Library for Intelligent Detection Systems i-lids," http://www.elec.qmul.ac.uk/staffinfo/andrea avss2007_d.html, 2011.
[76] J. Ferryman, J. Crowley, and A. Shahrokni, "Pets 2009 Benchmark Data," http://www.cvg.rdg.ac.uk/PETS2009a.html, 2011.
[77] A. Yao, D. Uebersax, J. Gall, and L. Van Gool, "Tracking People in Broadcast Sports," Proc. Symp. Pattern Recognition, 2010.
[78] M. Blank, L. Gorelick, E. Shechtman, M. Irani, and R. Basri, "Actions as Space-Time Shapes," Proc. IEEE Int'l Conf. Computer Vision, 2005.
[79] C. Schuldt, I. Laptev, and B. Caputo, "Recognizing Human Actions: A Local SVM Approach," Proc. Int'l Conf. Pattern Recognition, 2004.
[80] M. Rodriguez, J. Ahmed, and M. Shah, "Action Mach a Spatio-Temporal Maximum Average Correlation Height Filter for Action Recognition," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[81] "Ucr Videoweb Activities Dataset," http:/vwdata.ee.ucr.edu, 2011.
[82] C.-C. Chen, M. Ryoo, and J. Aggarwal, "UT-Tower Dataset: Aerial View Activity Classification Challenge 2010," http://cvrc.ece. utexas.edu/SDHA2010Aerial_View_Activity.html , 2011.
[83] M. Tenorth, J. Bandouch, and M. Beetz, "The TUM Kitchen Data Set of Everyday Manipulation Activities for Motion Tracking and Action Recognition," Proc. IEEE Int'l Workshop Tracking Humans for the Evaluation of Their Motion in Image Sequences, 2009.
[84] A. Oikonomopoulos, I. Patras, and M. Pantic, "An Implicit Spatiotemporal Shape Model for Human Activity Localization and Recognition," Proc. Human Comm. Behavior Analysis, 2009.
[85] K. Schindler and L. Van Gool, "Action Snippets: How Many Frames Does Human Action Recognition Require?" Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[86] B. Ommer, T. Mader, and J.M. Buhmann, "Seeing the Objects behind the Dots: Recognition in Videos from a Moving Camera," Int'l J. Computer Vision, vol. 83, pp. 57-71, 2009.
[87] L. Yeffet and L. Wolf, "Local Trinary Patterns for Human Action Recognition," Proc. IEEE Int'l Conf. Computer Vision, 2009.
[88] J. Kittler, M. Hatef, R. Duin, and J. Matas, "On Combining Classifiers," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 3, pp. 226-239, Mar. 1998.
22 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool