The Community for Technology Leaders
Green Image
Issue No. 07 - July (2013 vol. 35)
ISSN: 0162-8828
pp: 1635-1648
Yang Yang , Dept. of Electr. Eng. & Comput. Sci. (EECS), Univ. of Central Florida (UCF), Orlando, FL, USA
I. Saleemi , Dept. of Electr. Eng. & Comput. Sci. (EECS), Univ. of Central Florida (UCF), Orlando, FL, USA
M. Shah , Dept. of Electr. Eng. & Comput. Sci. (EECS), Univ. of Central Florida (UCF), Orlando, FL, USA
ABSTRACT
This paper proposes a novel representation of articulated human actions and gestures and facial expressions. The main goals of the proposed approach are: 1) to enable recognition using very few examples, i.e., one or k-shot learning, and 2) meaningful organization of unlabeled datasets by unsupervised clustering. Our proposed representation is obtained by automatically discovering high-level subactions or motion primitives, by hierarchical clustering of observed optical flow in four-dimensional, spatial, and motion flow space. The completely unsupervised proposed method, in contrast to state-of-the-art representations like bag of video words, provides a meaningful representation conducive to visual interpretation and textual labeling. Each primitive action depicts an atomic subaction, like directional motion of limb or torso, and is represented by a mixture of four-dimensional Gaussian distributions. For one--shot and k-shot learning, the sequence of primitive labels discovered in a test video are labeled using KL divergence, and can then be represented as a string and matched against similar strings of training videos. The same sequence can also be collapsed into a histogram of primitives or be used to learn a Hidden Markov model to represent classes. We have performed extensive experiments on recognition by one and k-shot learning as well as unsupervised action clustering on six human actions and gesture datasets, a composite dataset, and a database of facial expressions. These experiments confirm the validity and discriminative nature of the proposed representation.
INDEX TERMS
Humans, Optical imaging, Spatiotemporal phenomena, Training, Vectors, Joints, Histograms,Hidden Markov model, Human actions, one-shot learning, unsupervised clustering, gestures, facial expressions, action representation, action recognition, motion primitives, motion patterns, histogram of motion primitives, motion primitives strings
CITATION
Yang Yang, I. Saleemi, M. Shah, "Discovering Motion Primitives for Unsupervised Grouping and One-Shot Learning of Human Actions, Gestures, and Expressions", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 35, no. , pp. 1635-1648, July 2013, doi:10.1109/TPAMI.2012.253
191 ms
(Ver )