Computer Vision, IEEE International Conference on (2011)
Barcelona, Spain
Nov. 6, 2011 to Nov. 13, 2011
ISBN: 978-1-4577-1101-5
pp: 2003-2010
Tian Lan , School of Computing Science, Simon Fraser University, Canada
Yang Wang , Dept. of Computer Science, UIUC, USA
Greg Mori , School of Computing Science, Simon Fraser University, Canada
In this paper we develop an algorithm for action recognition and localization in videos. The algorithm uses a figure-centric visual word representation. Different from previous approaches it does not require reliable human detection and tracking as input. Instead, the person location is treated as a latent variable that is inferred simultaneously with action recognition. A spatial model for an action is learned in a discriminative fashion under a figure-centric representation. Temporal smoothness over video sequences is also enforced. We present results on the UCF-Sports dataset, verifying the effectiveness of our model in situations where detection and tracking of individuals is challenging.

