This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Discriminative Latent Models for Recognizing Contextual Group Activities
Aug. 2012 (vol. 34 no. 8)
pp. 1549-1562
Weilong Yang, Dept. of Comput. Sci., Univ. of Illinois at Urbana-Champaign, Urbana, IL, USA
Yang Wang, Sch. of Comput. Sci., Simon Fraser Univ., Burnaby, BC, Canada
Tian Lan, Sch. of Comput. Sci., Simon Fraser Univ., Burnaby, BC, Canada
S. N. Robinovitch, Sch. of Eng. Sci., Simon Fraser Univ., Burnaby, BC, Canada
G. Mori, Sch. of Comput. Sci., Simon Fraser Univ., Burnaby, BC, Canada
In this paper, we go beyond recognizing the actions of individuals and focus on group activities. This is motivated from the observation that human actions are rarely performed in isolation; the contextual information of what other people in the scene are doing provides a useful cue for understanding high-level activities. We propose a novel framework for recognizing group activities which jointly captures the group activity, the individual person actions, and the interactions among them. Two types of contextual information, group-person interaction and person-person interaction, are explored in a latent variable framework. In particular, we propose three different approaches to model the person-person interaction. One approach is to explore the structures of person-person interaction. Differently from most of the previous latent structured models, which assume a predefined structure for the hidden layer, e.g., a tree structure, we treat the structure of the hidden layer as a latent variable and implicitly infer it during learning and inference. The second approach explores person-person interaction in the feature level. We introduce a new feature representation called the action context (AC) descriptor. The AC descriptor encodes information about not only the action of an individual person in the video, but also the behavior of other people nearby. The third approach combines the above two. Our experimental results demonstrate the benefit of using contextual information for disambiguating group activities.

[1] I. Biederman, R. Mezzanotte, and J. Rabinowitz, "Scene Perception: Detecting and Judging Objects Undergoing Relational Violations," Cognitive Psychology, vol. 14, no. 2, pp. 143-177, 1982.
[2] T. Lan, Y. Wang, W. Yang, and G. Mori, "Beyond Actions: Discriminative Models for Contextual Group Activities," Proc. Advances in Neural Information Processing Systems, 2010.
[3] T. Lan, Y. Wang, G. Mori, and S. Robinovitch, "Retrieving Actions in Group Contexts," Proc. Int'l Workshop Sign Gesture Activity, 2010.
[4] K.P. Murphy, A. Torralba, and W.T. Freeman, "Using the Forest to See the Trees: A Graphical Model Relating Features, Objects, and Scenes," Proc. Advances in Neural Information Processing Systems, 2004.
[5] C. Desai, D. Ramanan, and C. Fowlkes, "Discriminative Models for Multi-Class Object Layout," Proc. IEEE Int'l Conf. Computer Vision, 2009.
[6] A. Rabinovich, A. Vedaldi, C. Galleguillos, E. Wiewiora, and S. Belongie, "Objects in Context," Proc. IEEE Int'l Conf. Computer Vision, 2007.
[7] A. Jain, A. Gupta, and L.S. Davis, "Learning What and How of Contextual Models for Scene Labeling," Proc. 11th European Conf. Computer Vision, 2010.
[8] G. Heitz and D. Koller, "Learning Spatial Context: Using Stuff to Find Things," Proc. European Conf. Computer Vision, 2008.
[9] M. Blank, L. Gorelick, E. Shechtman, M. Irani, and R. Basri, "Actions as Space-Time Shapes," Proc. IEEE Int'l Conf. Computer Vision, 2005.
[10] C. Schuldt, I. Laptev, and B. Caputo, "Recognizing Human Actions: A Local SVM Approach," Proc. Int'l Conf. Pattern Recognition, 2004.
[11] M. Marszalek, I. Laptev, and C. Schmid, "Actions in Context," Proc. IEEE Conf. Computer Vision Pattern Recognition, 2009.
[12] D. Han, L. Bo, and C. Sminchisescu, "Selection and Context for Action Recognition," Proc. IEEE Int'l Conf. Computer Vision, 2009.
[13] H. Kjellstrom, J. Romero, D.M. Mercado, and D. Kragic, "Simultaneous Visual Recognition of Manipulation Actions and Manipulated Objects," Proc. European Conf. Computer Vision, 2008.
[14] R. Filipovych and E. Ribeiro, "Recognizing Primitive Interactions by Exploring Actor-Object States," Proc. IEEE Conf. Computer Vision Pattern Recognition, 2008.
[15] B. Yao and L. Fei-Fei, "Grouplet: A Structured Image Representation for Recognizing Human and Object Interactions," Proc. IEEE Conf. Computer Vision Pattern Recognition, 2010.
[16] C. Desai, D. Ramanan, and C. Fowlkes, "Discriminative Models for Static Human-Object Interactions," Proc. Workshop Structured Models in Computer Vision, 2010.
[17] A. Gupta, A. Kembhavi, and L.S. Davis, "Observing Human-Object Interactions: Using Spatial and Functional Compatibility for Recognition," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 10, pp. 1775-1789, Oct. 2009.
[18] B. Yao and L. Fei-Fei, "Modeling Mutual Context of Object and Human Pose in Human-Object Interaction Activities," Proc. IEEE Conf. Computer Vision Pattern Recognition, 2010.
[19] T. Xiang and S. Gong, "Beyond Tracking: Modelling Activity and Understanding Behaviour," Int'l J. Computer Vision, vol. 67, pp. 21-51, 2006.
[20] A. Gupta, P. Srinivasan, J. Shi, and L.S. Davis, "Understanding Videos, Constructing Plots—Learning a Visually Grounded Storyline Model from Annotated Videos," Proc. IEEE Conf. Computer Vision Pattern Recognition, 2009.
[21] H. Zhong, J. Shi, and M. Visontai, "Detecting Unusual Activity in Video," Proc. IEEE Conf. Computer Vision Pattern Recognition, 2004.
[22] R. Mehran, A. Oyama, and M. Shah, "Abnormal Crowd Behavior Detection Using Social Force Model," Proc. IEEE Conf. Computer Vision Pattern Recognition, 2009.
[23] W. Choi, K. Shahid, and S. Savarese, "What Are They Doing?: Collective Activity Classification Using Spatio-Temporal Relationship among People," Proc. Int'l Workshop Visual Surveillance, 2009.
[24] N. Vaswani, A. Chowdhury, and R. Chellappa, "Activity Recognition Using the Dynamics of the Configuration of Interacting Objects," Proc. IEEE Conf. Computer Vision Pattern Recognition, 2003.
[25] S. Khan and M. Shah, "Detecting Group Activities Using Rigidity of Formation," Proc. Ann. ACM Int'l Conf. Multimedia, 2005.
[26] D. Zhang, D. Gatica-Perez, S. Bengio, I. McCowan, and G. Lathoud, "Modeling Individual and Group Actions in Meetings: A Two-Layer HMM Framework," IEEE Trans. Multimedia, vol. 8, no. 3, pp. 509-520, June 2006.
[27] D. Moore and I. Essa, "Recognizing Multitasked Activities from Video Using Stochastic Context-Free Grammar," Proc. Nat'l Conf. Artificial Intelligence, 2002.
[28] S.S. Intille and A. Bobick, "Recognizing Planned, Multiperson Action," Computer Vision and Image Understanding, vol. 81, pp. 414-445, 2001.
[29] G. Medioni, I. Cohen, F. Bremond, S. Hongeng, and R. Nevatia, "Event Detection and Analysis from Video Streams," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 8, pp. 873-889, Aug. 2001.
[30] M. Ryoo and J. Aggarwal, "Stochastic Representation and Recognition of High-Level Group Activities," Int'l J. Computer Vision, pp. 1-18, 2010.
[31] F. Cupillard, F. Bremond, and M. Thonnat, "Group Behavior Recognition with Multiple Cameras," Proc. IEEE Conf. Computer Vision Pattern Recognition, 2002.
[32] M.-C. Chang, N. Krahnstoever, S. Lim, and T. Yu, "Group Level Activity Recognition in Crowded Environments across Multiple Cameras," Proc. Workshop Activity Monitoring by Multi-Camera Surveillance Systems, 2010.
[33] S. Andrews, I. Tsochantaridis, and T. Hofmann, "Support Vector Machines for Multiple-Instance Learning," Proc. Advances in Neural Information Processing Systems, 2003.
[34] A. Quattoni, S. Wang, L.-P. Morency, M. Collins, and T. Darrell, "Hidden Conditional Random Fields," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 10, pp. 1848-1852, Oct. 2007.
[35] Y. Wang and G. Mori, "Max-Margin Hidden Conditional Random Fields for Human Action Recognition," Proc. IEEE Conf. Computer Vision Pattern Recognition, 2009.
[36] C.-N. Yu and T. Joachims, "Learning Structural SVMs with Latent Variables," Proc. Ann. Int'l Conf. Machine Learning, 2009.
[37] P. Felzenszwalb, D. McAllester, and D. Ramanan, "A Discriminatively Trained, Multiscale, Deformable Part Model," Proc. IEEE Conf. Computer Vision Pattern Recognition, 2008.
[38] A. Vedaldi and A. Zisserman, "Structured Output Regression for Detection with Partial Truncation," Proc. Advances in Neural Information Processing Systems, 2009.
[39] J.C. Niebles, C.-W. Chen, and L. Fei-Fei, "Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification," Proc. European Conf. Computer Vision, 2010.
[40] Y. Wang and G. Mori, "A Discriminative Latent Model of Object Classes and Attributes," Proc. European Conf. Computer Vision, 2010.
[41] W. Yang, Y. Wang, and G. Mori, "Recognizing Human Actions from Still Images with Latent Poses," Proc. IEEE Conf. Computer Vision Pattern Recognition, 2010.
[42] Y. Wang and G. Mori, "A Discriminative Latent Model of Image Region and Object Tag Correspondence," Proc. Advances in Neural Information Processing Systems, 2010.
[43] N. Dalal and B. Triggs, "Histogram of Oriented Gradients for Human Detection," Proc. IEEE Conf. Computer Vision Pattern Recognition, 2005.
[44] T.-M.-T. Do and T. Artieres, "Large Margin Training for Hidden Markov Models with Partially Observed States," Proc. Int'l Conf. Machine Learning, 2009.
[45] I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld, "Learning Realistic Human Actions from Movies," Proc. IEEE Conf. Computer Vision Pattern Recognition, 2008.
[46] C. Stauffer and W.E.L. Grimson, "Learning Patterns of Activity Using Real-Time Tracking," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 747-757, Aug. 2000.
[47] C.C. Loy, T. Xiang, and S. Gong, "Modelling Activity Global Temporal Dependencies Using Time Delayed Probabilistic Graphical Model," Proc. IEEE Int'l Conf. Computer Vision, 2009.
[48] M. Everingham, L. Gool, C. Williams, J. Winn, and A. Zisserman, "The Pascal Visual Object Classes (VOC) Challenge," Int'l J. Computer Vision, vol. 88, no. 2, pp. 303-338, 2010.

Index Terms:
trees (mathematics),computer vision,image motion analysis,human activity recognition,discriminative latent models,contextual group activities recognition,contextual information,individual person actions,group person interaction,person-person interaction,latent variable framework,tree structure,action context,AC,computer vision,Context,Feature extraction,Biological system modeling,Humans,Adaptation models,Vectors,Context modeling,latent structured models.,Group activity recognition,context
Citation:
Weilong Yang, Yang Wang, Tian Lan, S. N. Robinovitch, G. Mori, "Discriminative Latent Models for Recognizing Contextual Group Activities," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 8, pp. 1549-1562, Aug. 2012, doi:10.1109/TPAMI.2011.228
Usage of this product signifies your acceptance of the Terms of Use.