Benjamin Yao , University of California, Los Angeles, Los Angeles
Zicheng Liu , Microsoft Research, Redmond
Xiaohan Nie , University of California, Los Angeles, Los Angeles
Song-Chun Zhu , University of California, Los Angeles , Los Angeles
This paper presents Animated Pose Templates for detecting short-term, long-term and contextual actions from cluttered scenes in videos. Each pose template consists of two components: i) a shape template whose appearances represented by the Histogram of Oriented Gradient features; and ii) a motion template using the Histogram of Optical Flow features. While this pose template is suitable for detecting short-term action snippets, we extend it in two ways: i) for long-term actions, we animate the pose templates by adding temporal constraints in a Hidden Markov Model; and ii) for contextual actions, we treat contextual objects as additional parts of the pose templates. To train the model, we manually annotate part locations on some key frames, then introduce a Semi-Supervised Structural SVM algorithm that iterates between: i) learning model parameters from labeled data by solving a structural SVM optimization; and 2) imputing latent variables on unannotated frames and progressively accepting high score ones as newly labelled examples. The inference algorithm has two steps: i) Detecting top candidates for the pose templates; and ii) Computing the sequence of pose templates. Both are done by dynamic programming. In experiments, we test our method on both public and our own datasets. The results show that our model achieves comparable or better performance than state-of-the-art.
Dynamic programming, Video analysis, Representations, data structures, and transforms, Motion
Benjamin Yao, Zicheng Liu, Xiaohan Nie, Song-Chun Zhu, "Animated Pose Templates for Modelling and Detecting Human Actions", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. , no. , pp. 0, 5555, doi:10.1109/TPAMI.2013.144