The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.05 - May (2011 vol.33)
pp: 867-882
Hae Jong Seo , University of California Santa Cruz, Santa Cruz
Peyman Milanfar , University of California Santa Cruz, Santa Cruz
ABSTRACT
We present a novel action recognition method based on space-time locally adaptive regression kernels and the matrix cosine similarity measure. The proposed method uses a single example of an action as a query to find similar matches. It does not require prior knowledge about actions, foreground/background segmentation, or any motion estimation or tracking. Our method is based on the computation of novel space-time descriptors from the query video which measure the likeness of a voxel to its surroundings. Salient features are extracted from said descriptors and compared against analogous features from the target video. This comparison is done using a matrix generalization of the cosine similarity measure. The algorithm yields a scalar resemblance volume, with each voxel indicating the likelihood of similarity between the query video and all cubes in the target video. Using nonparametric significance tests by controlling the false discovery rate, we detect the presence and location of actions similar to the query video. High performance is demonstrated on challenging sets of action data containing fast motions, varied contexts, and complicated background. Further experiments on the Weizmann and KTH data sets demonstrate state-of-the-art performance in action categorization.
INDEX TERMS
Action recognition, space-time descriptor, correlation, regression analysis.
CITATION
Hae Jong Seo, Peyman Milanfar, "Action Recognition from One Example", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.33, no. 5, pp. 867-882, May 2011, doi:10.1109/TPAMI.2010.156
REFERENCES
[1] C. Schuldt, I. Laptev, and B. Caputo, “Recognizing Human Actions: A Local SVM Approach,” Proc. IEEE Conf. Pattern Recognition, June 2004.
[2] T. Darrell and A. Pentland, “Classifying Hand Gestures with a View-Based Distributed Representation,” Proc. Advances in Neural Information Processing Systems, vol. 6, pp. 945-952, 1993.
[3] J. Yamato, J. Ohya, and K. Ishii, “Recognizing Human Action in Time Sequential Image Using Hidden Markov Model,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 1992.
[4] H. Jiang, M. Crew, and Z. Li, “Successive Convex Matching for Action Detection,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006.
[5] T. Starner and A. Pentland, “Visual Recognition of American Sign Language Using Hidden Markov Model,” Proc. Int'l Workshop Automatic Face and Gesture Recognition, 1995.
[6] C. Carlsson and J. Sullivan, “Action Recognition by Shape Matching to Key Frame,” Proc. Workshop Models versus Examplars in Computer Vision, 2001.
[7] A. Yilmaz and M. Shah, “Actions Sketch: A Novel Action Representation,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2005.
[8] K. Cheung, S. Baker, and T. Kanade, “Shape-from-Silhouette of Articulated Objects and Its Use for Human Body Kinematics Estimation and Motion Capture,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2003.
[9] L. Gorelick, M. Blank, E. Shechtman, M. Irani, and R. Basri, “Actions as Space-Time Shapes,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 12, pp. 2247-2253, Dec. 2007.
[10] A.F. Bobick and J.W. Davis, “The Recognition of Human Movement Using Temporal Templates,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 3, pp. 1257-1265, Mar. 2001.
[11] J. Little and J. Boyd, “Recognizing People by Their Gait: The Shape of Motion,” J. Computer Vision Research, vol. 1, pp. 2-32, 1998.
[12] S. Ali and M. Shah, “Human Action Recognition in Videos Using Kinematic Features and Multiple Instance Learning,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 32, no. 2, pp. 288-303, Feb. 2010.
[13] Y. Yacoob and M. Black, “Parameterized Modeling and Recognition of Activities,” Computer Vision and Image Understanding, vol. 73, pp. 232-247, 1999.
[14] J. Niebles, H. Wang, and L. Fei-Fei, “Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words,” Int'l J. Computer Vision, vol. 79, no. 3, pp. 299-318, Mar. 2008.
[15] J. Niebles and L. Fei-Fei, “A Hierarchical Models of Shape and Appearance for Human Action Classification,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2007.
[16] Z. Laptev and T. Lindeberg, “Space-Time Interest Points,” Proc. IEEE Int'l Conf. Computer Vision, Oct. 2003.
[17] I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld, “Learning Realistic Human Actions from Movies,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[18] A. Oikonomopoulous, I. Patras, and M. Pantic, “Spationtemporal Saliency for Human Action Recognition,” Proc. IEEE Int'l Conf. Multimedia and Expo, 2005.
[19] T. Mahmood, A. Vasilescu, and S. Sethi, “Recognition of Action Events from Multiple Video Points,” Proc. IEEE Workshop Detection and Recognition of Events in Video, 2001.
[20] J. Liu, S. Ali, and M. Shah, “Recognizing Human Actions Using Multiple Features,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2008.
[21] P. Scovanner, S. Ali, and M. Shah, “A 3-Dimensional SIFT Descriptor and Its Application to Action Recognition,” Proc. ACM Multimedia Conf., 2007.
[22] E. Shechtman and M. Irani, “Space-Time Behavior-Based Correlation—or—How to Tell If Two Underlying Motion Fields Are Similar without Computing Them?” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 11, pp. 2045-2056, Nov. 2007.
[23] Y. Ke, R. Sukthankar, and M. Hebert, “Efficient Visual Event Detection Using Volumetric Features,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2005.
[24] T. Kim and R. Cipolla, “Canonical Correlation Analysis of Video Volume Tensors for Action Categorization and Detection,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 8, pp. 1415-1428, Aug. 2009.
[25] H. Ning, T. Han, D. Walther, M. Liu, and T. Huang, “Hierarchical Space-Time Model Enabling Efficient Search for Human Actions,” IEEE Trans. Circuits and Systems for Video Technology, vol. 19, no. 6, pp. 808-820, June 2009.
[26] C. Cedras and M. Shah, “Motion Based Recognition: A Survey,” Image and Vision Computing, vol. 13, pp. 129-155, 1995.
[27] J. Aggarwal and Q. Cai, “Human Motion Analysis: A Review,” Computer Vision and Image Understanding, vol. 73, pp. 428-440, 1999.
[28] P. Turaga, R. Chellappa, V. Subrahmanian, and O. Udrea, “Machine Recognition of Human Activities: A Survey,” IEEE Trans. Circuits and Systems for Video Technology, vol. 18, no. 11, pp. 1473-1488, Nov. 2008.
[29] J. Yuan, Z. Liu, and Y. Wu, “Discriminative Subvolume Search for Efficient Action Detection,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[30] O. Boiman, E. Shechtman, and M. Irani, “In Defense of Nearest-Neighbor Based Image Classification,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[31] C.H. Lampert, M.B. Blaschko, and T. Hofmann, “Beyond Sliding Windows: Object Localization by Efficient Subwindow Search,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[32] P. Viola and M. Jones, “Robust Real-Time Object Detection,” Int'l J. Computer Vision, vol. 57, no. 2, pp. 137-154, 2004.
[33] Y. Ke, R. Sukthankar, and M. Hebert, “Event Detection in Crowded Videos,” Proc. IEEE Int'l Conf. Computer Vision, 2007.
[34] A. Torralba, R. Fergus, and W. Freeman, “80 Million Tiny Images: A Large Data Set for Non-Parametric Object and Scene Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 11, pp. 1958-1970, Nov. 2008.
[35] B.C. Russell, A. Torralba, K.P. Murphy, and W.T. Freeman, “LabelMe: A Database and Web-Based Tool for Image Annotation,” Int'l J. Computer Vision, vol. 77, nos. 1-3, pp. 157-173, 2008.
[36] J. Hays and A. Efros, “Scene Completion Using Millions of Photographs,” Proc. ACM SIGGRAPH, 2007.
[37] H. Zhang, A. Berg, M. Maire, and J. Malik, “SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006.
[38] K. Grauman and T. Darrell, “The Pyramid Match Kernel: Efficient Learning with Sets of Features,” J. Machine Learning Research, vol. 8, pp. 725-760, 2007.
[39] C. Yeo, P. Ahammad, K. Ramchandran, and S.S. Satry, “High-Speed Action Recognition and Localization in Compressed Domain Videos,” IEEE Trans. Circuits and Systems for Video Technology, vol. 18, no. 8, pp. 1006-1015, Aug. 2008.
[40] W. Yang, Y. Wang, and G. Mori, “Human Action Recognition from a Single Clip Per Action,” Proc. Second Int'l Workshop Machine Learning for Vision-Based Motion Analysis, 2009.
[41] H.J. Seo and P. Milanfar, “Training-Free, Generic Object Detection Using Locally Adaptive Regression Kernels,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 32, no. 9, pp. 1688-1704, Sept. 2010.
[42] H.J. Seo and P. Milanfar, “Static and Space-Time Visual Saliency Detection by Self-Resemblance,” J. Vision, vol. 9, no. 12, no. 15, pp. 1-27, 2009, http://journalofvision.org/9/1215/ (doi:10.1167/9.12.15).
[43] H. Takeda, S. Farsiu, and P. Milanfar, “Kernel Regression for Image Processing and Reconstruction,” IEEE Trans. Image Processing, vol. 16, no. 2, pp. 349-366, Feb. 2007.
[44] H. Takeda, S. Farsiu, and P. Milanfar, “Deblurring Using Regularized Locally-Adaptive Kernel Regression,” IEEE Trans. Image Processing, vol. 17, no. 4, pp. 550-563, Apr. 2008.
[45] H. Takeda, P. Milanfar, M. Protter, and M. Elad, “Super-Resolution without Explicit Subpixel Motion Estimation,” IEEE Trans. Image Processing, vol. 18, no. 9, pp. 1958-1975, Sept. 2009.
[46] Y. Fu, S. Yan, and T.S. Huang, “Correlation Metric for Generalized Feature Extraction,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 12, pp. 2229-2235, Dec. 2008.
[47] Y. Fu and T.S. Huang, “Image Classification Using Correlation Tensor Analysis,” IEEE Trans. Image Processing, vol. 17, no. 2, pp. 226-234, Feb. 2008.
[48] C. Liu, “The Bayes Decision Rule Induced Similarity Measures,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 6, pp. 1086-1090, June 2007.
[49] D. Lin, S. Yan, and X. Tang, “Comparative Study: Face Recognition on Unspecific Persons Using Linear Subspace Methods,” Proc. IEEE Int'l Conf. Image Processing, 2005.
[50] Y. Ma, S. Lao, E. Takikawa, and M. Kawade, “Discriminant Analysis in Correlation Similarity Measure Space,” Proc. IEEE Int'l Conf. Machine Learning, 2007.
[51] J.W. Schneider and P. Borlund, “Matrix Comparison, Part 1: Motivation and Important Issues for Measuring the Resemblance between Proximity Measures or Ordination Results,” J. Am. Soc. for Information Science and Technology, vol. 58, no. 11, pp. 1586-1595, 2007.
[52] P. Ahlgren, B. Jarneving, and R. Rousseau, “Requirements for a Cocitation Similarity Measure, with Special Reference to Pearson's Correlation Coefficient,” J. Am. Soc. for Information Science and Technology, vol. 54, no. 6, pp. 550-560, 2003.
[53] J. Rodgers and W. Nicewander, “Thirteen Ways to Look at the Correlation Coefficient,” Am. Statistician, vol. 42, no. 1, pp. 59-66, 1988.
[54] E. Shechtman and M. Irani, “Matching Local Self-Similarities across Images and Videos,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007.
[55] J. Boulanger, C. Kervrann, and P. Bouthemy, “Space-Time Adaptation for Patch-Based Image Sequence Restoration,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 6, pp. 1096-1102, June 2007.
[56] A. Buades, B. Coll, and J.M. Morel, “Nonlocal Image and Movie Denoising,” Int'l J. Computer Vision, vol. 76, no. 2, pp. 123-139, 2008.
[57] I.N. Junejo, E. Dexter, I. Laptev, and P. Perez, “View-Independent Action Recognition from Temporal Self-Similarities,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 33, no. 1, pp. 172-185, Jan. 2011.
[58] D. Weinland, E. Boyer, and R. Ronfard, “Action Recognition from Arbitrary Views Using 3D Exemplars,” Proc. IEEE Int'l Conf. Computer Vision, 2007.
[59] Y. Chen, J. Bi, and J. Wang, “MILES: Multiple-Instance Learning via Embedded Instance Selection,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 12, pp. 1931-1947, Dec. 2006.
[60] H.J. Seo and P. Milanfar, “Generic Human Action Detection from a Single Example,” Proc. IEEE Int'l Conf. Computer Vision, Sept. 2009.
[61] A. Klaser, M. Marszalek, and C. Schmid, “A Spatio-Temporal Descriptor Based on 3d-Gradients,” Proc. British Machine Vision Conf., 2008.
[62] I. Laptev and P. Perez, “Retrieving Actions in Movie,” Proc. IEEE Int'l Conf. Computer Vision, 2007.
[63] C. Tomasi and R. Manduchi, “Bilateral Filtering for Gray and Color Images,” Proc. IEEE Int'l Conf. Computer Vision, 1998.
[64] R. Kimmel, Numerical Geometry of Images. Springer, 2003.
[65] N. Dalal and B. Triggs, “Histogram of Oriented Gradietns for Human Detection,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2005.
[66] I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld, “Learning Realistic Human Actions from Movies,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[67] R. Duda, P. Hart, and D. Stork, Pattern Classification, second ed. John Wiley and Sons, Inc., 2000.
[68] Y. Ke and R. Sukthankar, “PCA-SIFT: A More Distinctive Representation for Local Image Descriptors,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2004.
[69] M. Kendall and A. Stuart, “The Advanced Theory of Statistics, Volume 2: Inference and Relationship,” Griffin (Section 31.19), 1973.
[70] M. Tatsuoka, Multivariate Analysis. Macmillan, 1988.
[71] T. Calinski, M. Krzysko, and W. Wolynski, “A Comparison of Some Tests for Determining the Number of Nonzero Canonical Correlations,” Comm. in Statistics, Simulation, and Computation, vol. 35, pp. 727-749, 2006.
[72] Y. Benjamini and Y. Hochberg, “Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing,” J. Royal Statistical Soc. Series B, vol. 57, no. 1, pp. 289-300, 1995.
[73] F. Devernay, “A Non-Maxima Suppression Method for Edge Detection with Sub-Pixel Accuracy,” Technical Report RR-2724, INRIA, 1995.
[74] N. Petkov and E. Subramanian, “Motion Detection, Noise Reduction, Texture Suppression and Contour Enhancement by Spatiotemporal Gabor Filters with Surround Inhibition,” Biological Cybernetics, vol. 97, pp. 423-439, 2007.
[75] H. Jhuang, T. Serre, L. Wolf, and T. Poggio, “A Biologically Inspired System for Action Recognition,” Proc. IEEE Int'l Conf. Computer Vision, Oct. 2007.
[76] D. Batra, T. Chen, and R. Sukthankar, “Space-Time Shapelets for Action Recognition,” Proc. IEEE Workshop Motion and Video Computing, Jan. 2008.
[77] Z. Zhang, Y. Hu, S. Chan, and L.-T. Chia, “Motion Context: A New Representation for Human Action Recognition,” Proc. European Conf. Computer Vision, 2008.
[78] K. Schindler and L. van Gool, “Action Snippets: How Many Frames Does Human Action Recognition Require,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[79] X. Sun, M. Chen, and A. Hauptmann, “Action Recognition via Local Descriptors and Holistic Features,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[80] A. Fathi and G. Mori, “Action Recognition by Learning Mid-Level Motion Features,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[81] M. Bregonzio, S. Gong, and T. Xiang, “Recognising Actions as Clouds of Space-Time Interest Points,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[82] J. Liu and M. Shah, “Learning Human Actions via Information Maximization,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[83] P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie, “Behavior Recognition via Sparse Spatio-Temporal Features,” Proc. IEEE Int'l Workshop Visual Surveillance and Performance Evaluation of Tracking and Surveillance, Oct. 2005.
[84] A. Wong and J. Orchard, “A Nonlocal-Means Approach to Examplar-Based Inpainting,” Proc. IEEE Int'l Conf. Image Processing, 2008.
[85] K. Rapantzikos, Y. Avrithis, and S. Kollias, “Dense Saliency-Based Spationtemporal Feature Points for Action Recognition,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[86] A. Bandopadhay and J. Fu, “Searching Parameter Spaces with Noisy Linear Constraints,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 1988.
[87] G. Medioni, I. Cohen, F. Bremond, S. Hongeng, and R. Nevatia, “Event Detection and Analysis from Video Streams,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 8, pp. 873-890, Aug. 2001.
[88] T. Veit, F. Cao, and P. Bouthemy, “Probabilistic Parameter-Free Motion Detection,” Prof. IEEE Conf. Computer Vision and Pattern Recognition, June 2004.
[89] M. Marszalek, I. Laptev, and C. Schmid, “Actions in Context,” Prof. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[90] S.K. Divvala, D. Hoiem, J.H. Hays, A. Efros, and M. Hebert, “An Empirical Study of Context in Object Detection,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[91] A. Oliva and A. Torralba, “The Role of Context in Object Recognition,” Trends Cognitive Science, vol. 11, no. 12, pp. 520-527, Nov. 2007.
[92] H. Wang, M.M. Ullah, A. Klser, I. Laptev, and C. Schmid, “Evaluation of Local Spatio-Temporal Features for Action Recognition,” Proc. British Machine Vision Conf., 2009.
[93] L. Wolf, T. Hassner, and Y. Taigman, “The One-Shot Similarity Kernel,” Proc. IEEE Int'l Conf. Computer Vision, 2009.
[94] L. Wolf, T. Hassner, and Y. Taigman, “Descriptor Based Methods in the Wild,” Proc. Faces in Real-Life Image Workshop in European Conf. Computer Vision, 2008.
[95] J. Yang, K. Yu, Y. Gong, and T. Huang, “Linear Spatial Pyramid Matching Using Sparse Coding for Image Classification,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[96] H. Lee, A. Battle, R. Raina, and A.Y. Ng, “Efficient Sparse Coding Algorithms,” Proc. Advances in Neural Information Processing Systems, 2006.
[97] N. Kumar, L. Zhang, and S.K. Nayar, “What Is a Good Nearest Neighbors Algorithm for Finding Similar Patches in Images,” Proc. European Conf. Computer Vision, 2008.
[98] B. Kulis and K. Grauman, “Kernelized Locality-Sensitive Hashing for Scalable Image Search,” Proc. IEEE Int'l Conf. Computer Vision, 2009.
17 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool