2010 International Conference on Digital Image Computing: Techniques and Applications (2010)
Sydney, New South Wales Australia
Dec. 1, 2010 to Dec. 3, 2010
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/DICTA.2010.60
This paper presents a unified framework for recognizing human action in video using human pose estimation. Due to high variation of human appearance and noisy context background, accurate human pose analysis is hard to achieve and rarely employed for the task of action recognition. In our approach, we take advantage of the current success of human detection and view invariability of local feature-based approach to design a pose-based action recognition system. We begin with a frame-wise human detection step to initialize the search space for human local parts, then integrate the detected parts into human kinematic structure using a tree structural graphical model. The final human articulation configuration is eventually used to infer the action class being performed based on each single part behavior and the overall structure variation. In our work, we also show that even with imprecise pose estimation, accurate action recognition can still be achieved based on informative clues from the overall pose part configuration. The promising results obtained from action recognition benchmark have proven our proposed framework is comparable to the existing state-of-the-art action recognition algorithms.
J. Zhang, L. Cheng, L. Wang and T. H. Thi, "Human Action Recognition from Boosted Pose Estimation," 2010 International Conference on Digital Image Computing: Techniques and Applications(DICTA), Sydney, New South Wales Australia, 2010, pp. 308-313.