The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.07 - July (2011 vol.33)
pp: 1310-1323
Yang Wang , University of Illinois at Urbana-Champaign, Urbana
Greg Mori , Simon Fraser University, Burnaby
ABSTRACT
We present a discriminative part-based approach for human action recognition from video sequences using motion features. Our model is based on the recently proposed hidden conditional random field (HCRF) for object recognition. Similarly to HCRF for object recognition, we model a human action by a flexible constellation of parts conditioned on image observations. Differently from object recognition, our model combines both large-scale global features and local patch features to distinguish various actions. Our experimental results show that our model is comparable to other state-of-the-art approaches in action recognition. In particular, our experimental results demonstrate that combining large-scale global features and local patch features performs significantly better than directly applying HCRF on local patches alone. We also propose an alternative for learning the parameters of an HCRF model in a max-margin framework. We call this method the max-margin hidden conditional random field (MMHCRF). We demonstrate that MMHCRF outperforms HCRF in human action recognition. In addition, MMHCRF can handle a much broader range of complex hidden structures arising in various problems in computer vision.
INDEX TERMS
Human action recognition, part-based model, discriminative learning, max margin, hidden conditional random field.
CITATION
Yang Wang, Greg Mori, "Hidden Part Models for Human Action Recognition: Probabilistic versus Max Margin", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.33, no. 7, pp. 1310-1323, July 2011, doi:10.1109/TPAMI.2010.214
REFERENCES
[1] A. Oliva and A. Torralba, "Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope," Int'l J. Computer Vision, vol. 42, no. 3, pp. 145-175, 2001.
[2] D.G. Lowe, "Distinctive Image Features from Scale-Invariant Keypoints," Int'l J. Computer Vision, vol. 60, no. 2, pp. 91-110, 2004.
[3] P. Felzenszwalb, D. McAllester, and D. Ramanan, "A Discriminatively Trained, Multiscale, Deformable Part Model," Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, 2008.
[4] A. Quattoni, S. Wang, L.-P. Morency, M. Collins, and T. Darrell, "Hidden Conditional Random Fields," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 10, pp. 1848-1852, Oct. 2007.
[5] N. Dalal and B. Triggs, "Histogram of Oriented Gradients for Human Detection," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2005.
[6] A.C. Berg, T.L. Berg, and J. Malik, "Shape Matching and Object Recognition Using Low Distortion Correspondence," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2005.
[7] J. Sivic, B.C. Russell, A.A. Efros, A. Zisserman, and W.T. Freeman, "Discovering Objects and Their Location in Images," Proc. IEEE Int'l Conf. Computer Vision, vol. 1, pp. 370-377, 2005.
[8] P.F. Felzenszwalb and D.P. Huttenlocher, "Pictorial Structures for Object Recognition," Int'l J. Computer Vision, vol. 61, no. 1, pp. 55-79, Jan. 2005.
[9] M. Blank, L. Gorelick, E. Shechtman, M. Irani, and R. Basri, "Actions as Space-Time Shapes," Proc. IEEE Int'l Conf. Computer Vision, 2005.
[10] A.A. Efros, A.C. Berg, G. Mori, and J. Malik, "Recognizing Action at a Distance," Proc. IEEE Int'l Conf. Computer Vision, pp. 726-733, 2003.
[11] P. Dollár, V. Rabaud, G. Cottrell, and S. Belongie, "Behavior Recognition via Sparse Spatio-Temporal Features," Proc. IEEE Int'l Workshop Visual Surveillance and Performance Evaluation of Tracking and Surveillance, 2005.
[12] J.C. Niebles, H. Wang, and L. Fei-Fei, "Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words," Proc. British Machine Vision Conf., 2006.
[13] S. Nowozin, G. Bakir, and K. Tsuda, "Discriminative Subsequence Mining for Action Classification," Proc. IEEE Int'l Conf. Computer Vision, 2007.
[14] C. Schuldt, I. Laptev, and B. Caputo, "Recognizing Human Actions: A Local SVM Approach," Proc. Int'l Conf. Pattern Recognition, vol. 3, pp. 32-36, 2004.
[15] Y. Wang and G. Mori, "Learning a Discriminative Hidden Part Model for Human Action Recognition," Proc. Neural Information Processing Systems, 2009.
[16] Y. Wang and G. Mori, "Max-Margin Hidden Conditional Random Fields for Human Action Recognition," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[17] R. Cutler and L.S. Davis, "Robust Real-Time Periodic Motion Detection, Analysis, and Applications," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 781-796, Aug. 2000.
[18] R. Polana and R.C. Nelson, "Detection and Recognition of Periodic, Non-Rigid Motion," Int'l J. Computer Vision, vol. 23, no. 3, pp. 261-282, June 1997.
[19] J.L. Little and J.E. Boyd, "Recognizing People by Their Gait: The Shape of Motion," Videre, vol. 1, no. 2, pp. 1-32, 1998.
[20] C. Rao, A. Yilmaz, and M. Shah, "View-Invariant Representation and Recognition of Actions," Int'l J. Computer Vision, vol. 50, no. 2, pp. 203-226, 2002.
[21] A.F. Bobick and J.W. Davis, "The Recognition of Human Movement Using Temporal Templates," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 3, pp. 257-267, Mar. 2001.
[22] H. Jhuang, T. Serre, L. Wolf, and T. Poggio, "A Biologically Inspired System for Action Recognition," Proc. IEEE Int'l Conf. Computer Vision, 2007.
[23] I. Laptev and T. Lindeberg, "Space-Time Interest Points," Proc. Int'l Conf. Computer Vision, 2003.
[24] J.C. Niebles and L. Fei-Fei, "A Hierarchical Model of Shape and Appearance for Human Action Classification," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007.
[25] R. Fergus, P. Perona, and A. Zisserman, "Object Class Recognition by Unsupervised Scale-Invariant Learning," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2003.
[26] Y. Ke, R. Sukthankar, and M. Hebert, "Event Detection in Crowded Videos," Proc. IEEE Int'l Conf. Computer Vision, 2007.
[27] J. Lafferty, A. McCallum, and F. Pereira, "Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data," Proc. Int'l Conf. Machine Learning, 2001.
[28] B. Taskar, C. Guestrin, and D. Koller, "Max-Margin Markov Networks," Proc. Neural Information Processing Systems, 2004.
[29] B. Taskar, S. Lacoste-Julien, and M.I. Jordan, "Structured Prediction, Dual Extragradient and Bregman Projections," J. Machine Learning Research, vol. 7, pp. 1627-1653, 2006.
[30] Y. Altun, T. Hofmann, and I. Tsochantaridis, "SVM Learning for Interdependent and Structured Output Spaces," Machine Learning with Structured Outputs, G. Bakir, T. Hofman, B. Scholkopf, A.J. Smola, B. Taskar, and S.V.N. Vishwanathan, eds., MIT Press, 2006.
[31] B.D. Lucas and T. Kanade, "An Iterative Image Registration Technique with an Application to Stereo Vision," Proc. DARPA Image Understanding Workshop, 1981.
[32] C. Desai, D. Ramanan, and C. Fowlkes, "Discriminative Models for Multi-Class Object Layout," Proc. IEEE Int'l Conf. Computer Vision, 2009.
[33] K. Crammer and Y. Singer, "On the Algorithmic Implementation of Multiclass Kernel-Based Vector Machines," J. Machine Learning Research, vol. 2, pp. 265-292, 2001.
[34] Y. LeCun, S. Schopra, R. Radsell, R. Marc'Aurelio, and F. Huang, "A Tutorial on Energy-Based Learning," Predicting Structured Data, T. Hofman, B. Scholkopf, A. Smola, and B. Taskar, eds., MIT Press, 2006.
[35] S. Boyd and L. Vanderghe, Convex Optimization. Cambridge Univ. Press, 2004.
[36] M. Szummer, P. Kohli, and D. Hoiem, "Learning CRFs Using Graph Cuts," Proc. European Conf. Computer Vision, 2008.
[37] J.C. Platt, "Using Analytic QP and Sparseness to Speed Training of Support Vector Machines," Proc. Neural Information Processing Systems, 1999.
[38] M. Collins, A. Globerson, T. Koo, X. Carreras, and P.L. Bartlett, "Exponentiated Gradient Algorithms for Conditional Random Fields and Max-Margin Markov Networks," J. Machine Learning Research, vol. 9, pp. 1757-1774, Aug. 2008.
[39] M.J. Wainwright, T.S. Jaakkola, and A.S. Willsky, "MAP Estimation via Agreement on Trees: Message-Passing and Linear Programming," IEEE Trans. Information Theory, vol. 51, no. 11, pp. 3697-3717, Nov. 2005.
[40] D. Tran and D. Forsyth, "Configuration Estimates Improve Pedestrian Finding," Proc. Neural Information Processing Systems, 2008.
[41] S. Belongie, G. Mori, and J. Malik, "Matching with Shape Contexts," Analysis and Statistics of Shapes, Birkhäuser, 2005.
[42] C. Cherry and C. Quirk, "Discriminative, Syntactic Language Modeling through Latent SVMs," Proc. Assoc. for Machine Translation in the Americas, 2008.
[43] C.-N. Yu and T. Joachims, "Learning Structural SVMs with Latent Variables," Proc. Ann. Int'l Conf. Machine Learning, 2009.
[44] T. Hofmann, "Unsupervised Learning by Probabilistic Latent Semantic Analysis," Machine Learning, vol. 42, pp. 177-196, 2001.
[45] D.M. Blei, A.Y. Ng, and M.I. Jordan, "Latent Dirichlet Allocation," J. Machine Learning Research, vol. 3, pp. 993-1022, 2003.
[46] P. Viola, J.C. Platt, and C. Zhang, "Multiple Instance Boosting for Object Recognition," Proc. Neural Information Processing Systems, 2006.
[47] A.Y. Ng and M.I. Jordan, "On Discriminative versus Generative Classifiers: A Comparison of Logistic Regression and Naive Bayes," Proc. Neural Information Processing Systems, 2002.
[48] V. Kolmogorov and R. Zabih, "What Energy Functions Can Be Minimized via Graph Cuts," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 2, pp. 147-159, Feb. 2004.
[49] A. Schrijver, Combinatorial Optimization: Polyhedra and Efficiency. Springer, 2003.
[50] J. Liu and M. Shah, "Learning Human Actions via Information Maximization," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[51] Y. Ke, R. Sukthankar, and M. Hebert, "Efficient Visual Event Detection Using Volumetric Features," Proc. IEEE Int'l Conf. Computer Vision, 2005.
21 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool