The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.07 - July (2013 vol.35)
pp: 1635-1648
Yang Yang , Dept. of Electr. Eng. & Comput. Sci. (EECS), Univ. of Central Florida (UCF), Orlando, FL, USA
I. Saleemi , Dept. of Electr. Eng. & Comput. Sci. (EECS), Univ. of Central Florida (UCF), Orlando, FL, USA
M. Shah , Dept. of Electr. Eng. & Comput. Sci. (EECS), Univ. of Central Florida (UCF), Orlando, FL, USA
ABSTRACT
This paper proposes a novel representation of articulated human actions and gestures and facial expressions. The main goals of the proposed approach are: 1) to enable recognition using very few examples, i.e., one or k-shot learning, and 2) meaningful organization of unlabeled datasets by unsupervised clustering. Our proposed representation is obtained by automatically discovering high-level subactions or motion primitives, by hierarchical clustering of observed optical flow in four-dimensional, spatial, and motion flow space. The completely unsupervised proposed method, in contrast to state-of-the-art representations like bag of video words, provides a meaningful representation conducive to visual interpretation and textual labeling. Each primitive action depicts an atomic subaction, like directional motion of limb or torso, and is represented by a mixture of four-dimensional Gaussian distributions. For one--shot and k-shot learning, the sequence of primitive labels discovered in a test video are labeled using KL divergence, and can then be represented as a string and matched against similar strings of training videos. The same sequence can also be collapsed into a histogram of primitives or be used to learn a Hidden Markov model to represent classes. We have performed extensive experiments on recognition by one and k-shot learning as well as unsupervised action clustering on six human actions and gesture datasets, a composite dataset, and a database of facial expressions. These experiments confirm the validity and discriminative nature of the proposed representation.
INDEX TERMS
Humans, Optical imaging, Spatiotemporal phenomena, Training, Vectors, Joints, Histograms,Hidden Markov model, Human actions, one-shot learning, unsupervised clustering, gestures, facial expressions, action representation, action recognition, motion primitives, motion patterns, histogram of motion primitives, motion primitives strings
CITATION
Yang Yang, I. Saleemi, M. Shah, "Discovering Motion Primitives for Unsupervised Grouping and One-Shot Learning of Human Actions, Gestures, and Expressions", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.35, no. 7, pp. 1635-1648, July 2013, doi:10.1109/TPAMI.2012.253
REFERENCES
[1] http:/gesture.chalearn.org/, 2012.
[2] M. Ahmad and S. Lee, "Human Action Recognition Using Shape and CLG-Motion Flow from Multi-View Image Sequences," Pattern Recognition, vol. 41, pp. 2237-2252, July 2008.
[3] A. Bobick and J. Davis, "Real-Time Recognition of Activity Using Temporal Templates," Proc. Third IEEE Workshop Applications Computer Vision, 1996.
[4] C. Chen and J. Aggarwal, "Recognizing Human Action from a Far Field of View," Proc. IEEE Workshop Motion Video Computing, 2009.
[5] N. Dalal and B. Triggs, "Histograms of Oriented Gradients for Human Detection," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2005.
[6] T. Darrell and A. Pentland, "Space-Time Gestures," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 1993.
[7] J. Davis and A. Bobick, "The Representation and Recognition of Action Using Temporal Templates," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 1997.
[8] P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie, "Behavior Recognition via Sparse Spatio-Temporal Features," Proc. Second IEEE Joint Int'l Workshop Visual Surveillance and Performance Evaluation of Tracking and Surveillance, 2005.
[9] A. Efros, A. Berg, G. Mori, and J. Malik, "Recognizing Action at a Distance," Proc. Ninth IEEE Int'l Conf. Computer Vision, 2003.
[10] A. Fathi and G. Mori, "Action Recognition by Learning Mid-Level Motion Features," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[11] R. Filipovych and E. Ribeiro, "Learning Human Motion Models from Unsegmented Videos," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[12] A. Gilbert, J. Illingworth, and R. Bowden, "Scale Invariant Action Recognition Using Compound Features Mined from Dense Spatio-Temporal Corners," Proc. European Conf. Computer Vision, 2008.
[13] J. Hoey and J. Little, "Representation and Recognition of Complex Human Motion," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2000.
[14] N. Ikizler-Cinbis and S. Sclaroff, "Object, Scene and Actions: Combining Multiple Features for Human Action Recognition," Proc. European Conf. Computer Vision, 2010.
[15] H. Jhuang, T. Serre, L. Wolf, and T. Poggio, "A Biologically Inspired System for Action Recognition," Proc. 11th IEEE Int'l Conf. Computer Vision, 2007.
[16] Y. Ke, R. Sukthankar, and M. Hebert, "Efficient Visual Event Detection Using Volumetric Features," Proc. 10th IEEE Int'l Conf. Computer Vision, 2005.
[17] Y. Ke, R. Sukthankar, and M. Hebert, "Event Detection in Crowded Videos," Proc. 11th IEEE Int'l Conf. Computer Vision, 2007.
[18] A. Kläser, M. Marszałek, I. Laptev, and C. Schmid, "Will Person Detection Help Bag-of-Features Action Recognition?" Technical Report RR-7373, INRIA Grenoble-Rhône-Alpes, 2010.
[19] A. Kovashka and K. Grauman, "Learning a Hierarchy of Discriminative Space-Time Neighborhood Features for Human Action Recognition," Proc. IEEE Conf. Computer Vision and Pattern Recognition), 2010.
[20] L. Kratz and K. Nishino, "Anomaly Detection in Extremely Crowded Scenes Using Spatio-Temporal Motion Pattern Models," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[21] H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre, "HMDB: A Large Video Database for Human Motion Recognition," Proc. IEEE Int'l Conf. Computer Vision, 2011.
[22] I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld, "Learning Realistic Human Actions from Movies," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[23] Q. Le, W. Zou, S. Yeung, and A. Ng, "Learning Hierarchical Invariant Spatio-Temporal Features for Action Recognition with Independent Subspace Analysis," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2011.
[24] Z. Lin, Z. Jiang, and L. Davis, "Recognizing Actions by Shape-Motion Prototype Trees," Proc. 12th IEEE Int'l Conf. Computer Vision, 2009.
[25] J. Liu, J. Luo, and M. Shah, "Recognizing Realistic Actions from Videos in the Wild," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[26] J. Liu and M. Shah, "Learning Human Action via Information Maximization," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[27] D. Lowe, "Distinctive Image Features from Scale-Invariant Keypoints," Int'l J. Computer Vision, vol. 60, no. 2, pp. 91-110, Nov. 2004.
[28] S. Needleman and C. Wunsch, "A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins," J. Molecular Biology, vol. 48, no. 3, pp. 443-453, Mar. 1970.
[29] O. Oreifej, R. Mehran, and M. Shah, "Human Identity Recognition in Aerial Images," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010.
[30] V. Parameswaran and R. Chellappa, "View Invariants for Human Action Recognition," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2003.
[31] L. Rabiner, "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition," Proc. IEEE, vol. 77, no. 2, pp. 257-286, Feb. 1989.
[32] N. Robertson and I. Reid, "Behaviour Understanding in Video: A Combined Method," Proc. 10th IEEE Int'l Conf. Computer Vision, 2005.
[33] M. Rodriguez, J. Ahmed, and M. Shah, "Action MACH: A Spatio-Temporal Maximum Average Correlation Height Filter for Action Recognition," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[34] I. Saleemi, L. Hartung, and M. Shah, "Scene Understanding by Statistical Modeling of Motion Patterns," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010.
[35] C. Schuldt, I. Laptev, and B. Caputo, "Recognizing Human Actions: A Local SVM Approach," Proc. 17th Int'l Conf. Pattern Recognition, 2004.
[36] E. Shechtman, L. Gorelick, M. Blank, M. Irani, and R. Basri, "Actions as Space-Time Shapes," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 12, pp. 2247-2253, Dec. 2007.
[37] V. Singh and R. Nevatia, "Action Recognition in Cluttered Dynamic Scenes Using Pose-Specific Part Models," Proc. IEEE Int'l Conf. Computer Vision, 2011.
[38] C. Stauffer and E. Grimson, "Learning Patterns of Activity Using Real-Time Tracking," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 747-757, Aug. 2000.
[39] C. Thurau and V. Hlavac, "Pose Primitive Based Human Action Recognition in Videos or Still Images," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[40] Y. Tian, T. Kanade, and J. Cohn, "Recognizing Action Units for Facial Expression Analysis," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 2, pp. 97-115, Feb. 2001.
[41] K. Tran, I. Kakadiaris, and S. Shah, "Modeling Motion of Body Parts for Action Recognition," Proc. British Machine Vision Conf., 2011.
[42] P. Turaga, R. Chellappa, V. Subrahmanian, and O. Udrea, "Machine Recognition of Human Activities: A Survey," IEEE Trans. Circuits and Systems for Video Technology, vol. 18, no. 11, pp. 1473-1488, Nov. 2008.
[43] H. Wang, A. Kläser, C. Schmid, and C. Liu, "Action Recognition by Dense Trajectories," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2011.
[44] H. Wang, M. Ullah, A. Kläser, I. Laptev, and C. Schmid, "Evaluation of Local Spatio-Temporal Features for Action Recognition," Proc. British Machine Vision Conf., 2009.
[45] X. Wang, X. Ma, and G. Grimson, "Unsupervised Activity Perception by Hierarchical Bayesian Models," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007.
[46] Y. Wang, P. Sabzmeydani, and G. Mori, "Semi-Latent Dirichlet Allocation: A Hierarchical Model for Human Action Recognition," Proc. Second Conf. Human Motion: Understanding, Modeling, Capture and Animation, 2007.
[47] D. Weinland, E. Boyer, and R. Ronfard, "Action Recognition from Arbitrary Views Using 3D Exemplars," Proc. 11th IEEE Int'l Conf. Computer Vision, 2007.
[48] D. Weinland, R. Ronfard, and E. Boyer, "A Survey of Vision-Based Methods for Action Representation, Segmentation and Recognition," Computer Vision and Image Understanding, vol. 115, no. 2, pp. 224-241, 2010.
[49] Y. Yacoob and M. Black, "Parameterized Modeling and Recognition of Activities," Proc. Sixth IEEE Int'l Conf. Computer Vision, 1998.
[50] A. Yilmaz and M. Shah, "Actions Sketch: A Novel Action Representation," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2005.
[51] A. Yilmaz and M. Shah, "Recognizing Human Actions in Videos Acquired by Uncalibrated Moving Cameras," Proc. 10th IEEE Int'l Conf. Computer Vision, 2005.
[52] L. Zelnik-Manor and M. Irani, "Event-Based Analysis of Video," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2001.
30 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool