The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.03 - March (2013 vol.35)
pp: 527-540
K. G. Derpanis , Dept. of Comput. Sci. & Eng., York Univ., Toronto, ON, Canada
M. Sizintsev , Dept. of Comput. Sci. & Eng., York Univ., Toronto, ON, Canada
K. J. Cannons , Dept. of Comput. Sci. & Eng., York Univ., Toronto, ON, Canada
R. P. Wildes , Dept. of Comput. Sci. & Eng., York Univ., Toronto, ON, Canada
ABSTRACT
This paper provides a unified framework for the interrelated topics of action spotting, the spatiotemporal detection and localization of human actions in video, and action recognition, the classification of a given video into one of several predefined categories. A novel compact local descriptor of video dynamics in the context of action spotting and recognition is introduced based on visual spacetime oriented energy measurements. This descriptor is efficiently computed directly from raw image intensity data and thereby forgoes the problems typically associated with flow-based features. Importantly, the descriptor allows for the comparison of the underlying dynamics of two spacetime video segments irrespective of spatial appearance, such as differences induced by clothing, and with robustness to clutter. An associated similarity measure is introduced that admits efficient exhaustive search for an action template, derived from a single exemplar video, across candidate video sequences. The general approach presented for action spotting and recognition is amenable to efficient implementation, which is deemed critical for many important applications. For action spotting, details of a real-time GPU-based instantiation of the proposed approach are provided. Empirical evaluation of both action spotting and action recognition on challenging datasets suggests the efficacy of the proposed approach, with state-of-the-art performance documented on standard datasets.
INDEX TERMS
Spatiotemporal phenomena, Clutter, Energy measurement, Visualization, Dynamics, Robustness, Cameras, real-time implementations, Action spotting, action recognition, action representation, human motion, visual spacetime, spatiotemporal orientation, template matching
CITATION
K. G. Derpanis, M. Sizintsev, K. J. Cannons, R. P. Wildes, "Action Spotting and Recognition Based on a Spatiotemporal Orientation Analysis", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.35, no. 3, pp. 527-540, March 2013, doi:10.1109/TPAMI.2012.141
REFERENCES
[1] C. Schuldt, I. Laptev, and B. Caputo, "Recognizing Human Actions: A Local SVM Approach," Proc. 17th Int'l Conf. Pattern Recognition, pp. 32-36, 2004.
[2] M. Rodriguez, J. Ahmed, and M. Shah, "Action MACH a Spatio-Temporal Maximum Average Correlation Height Filter for Action Recognition," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[3] M. Marszalek, I. Laptev, and C. Schmid, "Actions in Context," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 2929-2936, 2009.
[4] K. Derpanis and R. Wildes, "The Structure of Multiplicative Motions in Natural Imagery," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 32, no. 7, pp. 1310-1316, July 2010.
[5] J. Hays and A. Efros, "Scene Completion Using Millions of Photographs," Proc. ACM Siggraph, vol. 26, no. 1, 2007.
[6] A. Torralba, R. Fergus, and W. Freeman, "80 Million Tiny Images: A Large Database for Non-Parametric Object and Scene Recognition," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 11, pp. 1958-1970, Nov. 2008.
[7] H. Ning, T. Han, D. Walther, M. Liu, and T. Huang, "Hierarchical Space-Time Model Enabling Efficient Search for Human Actions," IEEE Trans. Circuits and Systems for Video Technology, vol. 19, no. 6, pp. 808-820, June 2009.
[8] H. Seo and P. Milanfar, "Action Recognition from One Example," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 33, no. 5, pp. 867-882, May 2011.
[9] K. Derpanis, M. Sizintsev, K. Cannons, and R. Wildes, "Efficient Action Spotting Based on a Spacetime Oriented Structure Representation," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010.
[10] J. Yuan, Z. Liu, and Y. Wu, "Discriminative Video Pattern Search for Efficient Action Detection," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 33, no. 9, pp. 1728-1743, Sept. 2011.
[11] P. Turaga, R. Chellappa, V. Subrahmanian, and O. Udrea, "Machine Recognition of Human Activities: A Survey," IEEE Trans. Circuits and Systems for Video Technology, vol. 18, no. 11, pp. 1473-1488, Nov. 2008.
[12] R. Poppe, "A Survey on Vision-Based Human Action Recognition," Image and Vision Computing, vol. 28, no. 6, pp. 976-990, 2010.
[13] Y. Yacoob and M. Black, "Parameterized Modeling and Recognition of Activities," Computer Vision and Image Understanding, vol. 73, no. 2, pp. 232-247, 1999.
[14] D. Ramanan and D. Forsyth, "Automatic Annotation of Everyday Movements," Proc. Neural Information Processing Systems, 2003.
[15] C. Fanti, L. Zelnik Manor, and P. Perona, "Hybrid Models for Human Motion Recognition," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 1166-1173, 2005.
[16] S. Ali, A. Basharat, and M. Shah, "Chaotic Invariants for Human Action Recognition," Proc. 11th IEEE Int'l Conf. Computer Vision, 2007.
[17] A. Bobick and J. Davis, "The Recognition of Human Movement Using Temporal Templates," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 3, pp. 257-267, Mar. 2001.
[18] D. Weinland, R. Ronfard, and E. Boyer, "Free Viewpoint Action Recognition Using Motion History Volumes," Computer Vision and Image Understanding, vol. 103, nos. 2/3, pp. 249-257, 2006.
[19] L. Gorelick, M. Blank, E. Shechtman, M. Irani, and R. Basri, "Actions as Space-Time Shapes," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 12, pp. 2247-2253, Dec. 2007.
[20] Y. Ke, R. Sukthankar, and M. Hebert, "Event Detection in Crowded Videos," Proc. 11th IEEE Int'l Conf. Computer Vision, 2007.
[21] Y. Ke, R. Sukthankar, and M. Hebert, "Spatio-Temporal Shape and Flow Correlation for Action Recognition," Proc. WVS, 2007.
[22] A. Yilmaz and M. Shah, "A Differential Geometric Approach to Representing the Human Actions," Computer Vision and Image Understanding, vol. 109, no. 3, pp. 335-351, 2008.
[23] Z. Zhang, Y. Hu, S. Chan, and L. Chia, "Motion Context: A New Representation for Human Action Recognition," Proc. 10th European Conf. Computer Vision, pp. 817-829, 2008.
[24] Y. Hu, L. Cao, F. Lv, S. Yan, Y. Gong, and T. Huang, "Action Detection in Complex Scenes with Spatial and Temporal Ambiguities," Proc. 12th IEEE Int'l Conf. Computer Vision, pp. 128-135, 2009.
[25] T. Kobayashi and N. Otsu, "Three-Way Auto-Correlation Approach to Motion Recognition," Pattern Recognition Letters, vol. 30, no. 3, pp. 212-221, 2009.
[26] Z. Lin, Z. Jiang, and L. Davis, "Recognizing Actions by Shape-Motion Prototype Trees," Proc. 12th IEEE Int'l Conf. Computer Vision, pp. 444-451, 2009.
[27] X. Sun, M. Chen, and A. Hauptmann, "Action Recognition via Local Descriptors and Holistic Features," Proc. IEEE CVPR Workshop for Human Communicative Behavior Analysis, pp. 58-65, 2009.
[28] I. Laptev, "On Space-Time Interest Points," Int'l J. Computer Vision, vol. 64, nos. 2/3, pp. 107-123, 2005.
[29] P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie, "Behavior Recognition via Sparse Spatio-Temporal Features," Proc. Second Joint IEEE Int'l Workshop Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65-72, 2005.
[30] A. Oikonomopoulos, I. Patras, and M. Pantic, "Spatiotemporal Salient Points for Visual Recognition of Human Actions," IEEE Trans. Systems, Man, and Cybernetics Part B: Cybernetics, vol. 36, no. 3, pp. 710-719, June 2005.
[31] S. Wong, T. Kim, and R. Cipolla, "Learning Motion Categories Using Both Semantic and Structural Information," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007.
[32] G. Willems, T. Tuytelaars, and L. Van Gool, "An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector," Proc. 10th European Conf. Computer Vision, pp. 650-666, 2008.
[33] K. Rapantzikos, Y. Avrithis, and S. Kollias, "Dense Saliency-Based Spatiotemporal Feature Points for Action Recognition," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[34] J. Niebles, H. Wang, and L. Fei-Fei, "Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words," Int'l J. Computer Vision, vol. 79, no. 3, pp. 299-318, 2008.
[35] I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld, "Learning Realistic Human Actions from Movies," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[36] J. Liu and M. Shah, "Learning Human Actions via Information Maximization," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[37] K. Mikolajczyk and H. Uemura, "Action Recognition with Motion-Appearance Vocabulary Forest," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[38] J. Liu, J. Luo, and M. Shah, "Recognizing Realistic Actions from Videos 'in the Wild'," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[39] J. Sun, X. Wu, S. Yan, L. Cheong, T. Chua, and J. Li, "Hierarchical Spatio-Temporal Context Modeling for Action Recognition," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[40] Y. Ke, R. Sukthankar, and M. Hebert, "Efficient Visual Event Detection Using Volumetric Features," Proc. 10th IEEE Int'l Conf. Computer Vision, pp. 166-173, 2005.
[41] N. Apostoloff and A. Fitzgibbon, "Learning Spatiotemporal T-Junctions for Occlusion Detection," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 553-559, 2005.
[42] A. Gilbert, J. Illingworth, and R. Bowden, "Action Recognition Using Mined Hierarchical Compound Features," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 33, no. 1, pp. 883-897, May 2011.
[43] L. Yeffet and L. Wolf, "Local Trinary Patterns for Human Action Recognition," Proc. 12th IEEE Int'l Conf. Computer Vision, pp. 492-497, 2009.
[44] H. Wang, M. Ullah, A. Klaser, I. Laptev, and C. Schmid, "Evaluation of Local Spatio-Temporal Features for Action Recognition," Proc. British Machine Vision Conf., 2009.
[45] A. Klaser, "Learning Human Actions in Videos," PhD dissertation, Université de Gre noble, 2010.
[46] A. Kovashka and K. Grauman, "Learning a Hierarchy of Discriminative Space-Time Neighborhood Features for Human Action Recognition," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 2046-2053, 2010.
[47] T. Kim and R. Cipolla, "Canonical Correlation Analysis of Video Volume Tensors for Action Categorization and Detection," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 8, pp. 1415-1428, Aug. 2009.
[48] A. Efros, A. Berg, G. Mori, and J. Malik, "Recognizing Action at a Distance," Proc. Ninth IEEE Int'l Conf. Computer Vision, pp. 726-733, 2003.
[49] A. Fathi and G. Mori, "Action Recognition by Learning Mid-Level Motion Features," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[50] C. Yeo, P. Ahammad, K. Ramchandran, and S. Sastry, "High-Speed Action Recognition and Localization in Compressed Domain Videos," IEEE Trans. Circuits and Systems for Video Technology, vol. 18, no. 8, pp. 1006-1015, Aug. 2008.
[51] S. Ali and M. Shah, "Human Action Recognition in Videos Using Kinematic Features and Multiple Instance Learning," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 32, no. 2, pp. 288-303, Feb. 2010.
[52] E. Shechtman and M. Irani, "Space-Time Behavior-Based Correlation—OR—How to Tell If Two Underlying Motion Fields Are Similar without Computing Them?" IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 11, pp. 2045-2056, Nov. 2007.
[53] P. Matikainen, R. Sukthankar, M. Hebert, and Y. Ke, "Fast Motion Consistency through Matrix Quantization," Proc. British Machine Vision Conf., 2008.
[54] O. Chomat and J. Crowley, "Probabilistic Recognition of Activity Using Local Appearance," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 104-109, 1999.
[55] H. Jhuang, T. Serre, L. Wolf, and T. Poggio, "A Biologically Inspired System for Action Recognition," Proc. 11th IEEE Int'l Conf. Computer Vision, 2007.
[56] A. Klaser, M. Marszalek, and C. Schmid, "A Spatio-Temporal Descriptor Based on 3D-Gradients," Proc. British Machine Vision Conf., 2008.
[57] Google, "Image Search," http:/images.google.com, 2012.
[58] D. Heeger, "Optical Flow from Spatiotemporal Filters," Int'l J. Computer Vision, vol. 1, no. 4, pp. 279-302, 1988.
[59] K. Derpanis and R. Wildes, "Early Spatiotemporal Grouping with a Distributed Oriented Energy Representation," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[60] K. Derpanis and R. Wildes, "Dynamic Texture Recognition Based on Distributions of Spacetime Oriented Structure," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010.
[61] B. Jähne, Digital Image Processing, sixth ed. Springer, 2005.
[62] A. Watson and A. Ahumada, "A Look at Motion in the Frequency Domain," Proc. Motion Workshop, pp. 1-10, 1983.
[63] E. Adelson and J. Bergen, "Spatiotemporal Energy Models for the Perception of Motion," J. Optical Soc. Am.-A, vol. 2, no. 2, pp. 284-299, 1985.
[64] W. Freeman and E. Adelson, "The Design and Use of Steerable Filters," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 13, no. 9, pp. 891-906, Sept. 1991.
[65] K. Derpanis and J. Gryn, "Three-Dimensional nth Derivative of Gaussian Separable Steerable Filters," Proc. IEEE Int'l Conf. Image Processing, pp. 553-556, 2005.
[66] Y. Rubner, J. Puzicha, C. Tomasi, and J. Buhmann, "Empirical Evaluation of Dissimilarity Measures for Color and Texture," Computer Vision and Image Understanding, vol. 84, no. 1, pp. 25-43, 2001.
[67] A. Bhattacharyya, "On a Measure of Divergence between Two Statistical Populations Defined by Their Probability Distribution," Bull. Calcutta Math. Soc., vol. 35, pp. 99-110, 1943.
[68] D. Comaniciu, V. Ramesh, and P. Meer, "Kernel-Based Object Tracking," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 25, no. 5, pp. 564-577, May 2003.
[69] D. Barnea and H. Silverman, "A Class of Algorithms for Fast Digital Image Registration," IEEE Trans. Computers, vol. 21, no. 2, pp. 179-186, Feb. 1972.
[70] R. Bracewell, The Fourier Transform and Its Applications. McGraw-Hill, 2000.
[71] nVidia CUDA, www.nvidia.com/objectcuda_home.html, 2012.
[72] M. Harris, Optimizing Parallel Reduction in CUDA, NVIDIA Developer Tech nology, 2007.
[73] P. Felzenszwalb and D. Huttenlocher, "Pictorial Structures for Object Recognition," Int'l J. Computer Vision, vol. 61, no. 1, pp. 55-79, 2005.
[74] S. Theodoridis and K. Koutroumbas, Pattern Recognition, third ed. Academic Press, 2006.
[75] O. Chapelle, P. Haffner, and V. Vapnik, "Support Vector Machines for Histogram-Based Image Classification," IEEE Trans. Neural Networks, vol. 10, no. 5, pp. 1055-1064, Sept. 1999.
[76] C. Chang and C. Lin, "LIBSVM: A Library for Support Vector Machines," 2001.
89 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool