This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
The Action Similarity Labeling Challenge
March 2012 (vol. 34 no. 3)
pp. 615-621
T. Hassner, Dept. of Math. & Comput. Sci., Open Univ. of Israel, Raanana, Israel
O. Kliper-Gross, Dept. of Math. & Comput. Sci., Weizmann Inst. of Sci., Rehovot, Israel
L. Wolf, Blavatnik Sch. of Comput. Sci., Tel Aviv Univ., Tel Aviv, Israel
Recognizing actions in videos is rapidly becoming a topic of much research. To facilitate the development of methods for action recognition, several video collections, along with benchmark protocols, have previously been proposed. In this paper, we present a novel video database, the “Action Similarity LAbeliNg” (ASLAN) database, along with benchmark protocols. The ASLAN set includes thousands of videos collected from the web, in over 400 complex action classes. Our benchmark protocols focus on action similarity (same/not-same), rather than action classification, and testing is performed on never-before-seen actions. We propose this data set and benchmark as a means for gaining a more principled understanding of what makes actions different or similar, rather than learning the properties of particular action classes. We present baseline results on our benchmark, and compare them to human performance. To promote further study of action similarity techniques, we make the ASLAN database, benchmarks, and descriptor encodings publicly available to the research community.

[1] C. Schuldt, I. Laptev, and B. Caputo, “Recognizing Human Actions: A Local SVM Approach,” Proc. 17th Int'l Conf. Pattern Recognition, vol. 3, pp. 32-36, 2004.
[2] M. Blank, L. Gorelick, E. Shechtman, M. Irani, and R. Basri, “Actions as Space-Time Shapes,” Proc. IEEE Int'l Conf. Computer Vision, pp. 1395-1402, 2005.
[3] I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld, “Learning Realistic Human Actions from Movies,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 1-8, 2008.
[4] M. Marszalek, I. Laptev, and C. Schmid, “Actions in Context,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 2929-2936, 2009.
[5] J. Liu, J. Luo, and M. Shah, “Recognizing Realistic Actions from Videos ‘in the Wild’,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 1996-2003, 2009.
[6] A. Torralba, R. Fergus, and W.T. Freeman, “80 Million Tiny Images: A Large Data Set for Nonparametric Object and Scene Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 11, pp. 1958-1970, Nov. 2008.
[7] G. Griffin, A. Holub, and P. Perona, “Caltech-256 Object Category Dataset,” Technical Report 7694, California Inst. of Tech nology, 2007.
[8] G.B. Huang, M. Ramesh, T. Berg, and E. Learned-Miller, “Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments,” Technical Report 07-49, Univ. of Massachusetts, Amherst, 2007.
[9] A. Veeraraghavan, R. Chellappa, and A.K. Roy-Chowdhury, “The Function Space of an Activity,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 959-968, 2006.
[10] D. Weinland, R. Ronfard, and E. Boyer, “Free Viewpoint Action Recognition Using Motion History Volumes,” Computer Vision and Image Understanding, vol. 104, nos. 2/3, pp. 249-257, 2006.
[11] M.D. Rodriguez, J. Ahmed, and M. Shah, “Action Mach: A Spatio-Temporal Maximum Average Correlation Height Filter for Action Recognition,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 1-8, 2008.
[12] K. Mikolajczyk and H. Uemura, “Action Recognition with Motion-Appearance Vocabulary Forest,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 1-8, 2008.
[13] L. Yeffet and L. Wolf, “Local Trinary Patterns for Human Action Recognition,” Proc. IEEE 12th Int'l Conf. Computer Vision, pp. 492-497, 2009.
[14] R. Messing, C. Pal, and H. Kautz, “Activity Recognition Using the Velocity Histories of Tracked Keypoints,” Proc. IEEE 12th Int'l Conf. Computer Vision, pp. 104-111, 2009.
[15] A. Patron-Perez, M. Marszalek, A. Zisserman, and I. Reid, “High Five: Recognising Human Interactions in TV Shows,” Proc. British Machine Vision Conf., 2010.
[16] J.C. Nieblesand, C.-W. Chen, and L. Fei-Fei, “Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification,” Proc. 11th European Conf. Computer Vision, pp. 392-405, 2010.
[17] G. Yo, J. Yuan, and Z. Liu, “Unsupervised Random Forest Indexing for Fast Action Search,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 865-872, 2011.
[18] P. Dollár, V. Rabaud, G. Cottrell, and S. Belongie, “Behavior Recognition via Sparse Spatio-Temporal Features,” Proc. IEEE Int'l Workshop Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65-72, 2005.
[19] J.C. Niebles and L. Fei-Fei, “A Hierarchical Model of Shape and Appearance for Human Action Classification,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 1-8, 2007.
[20] I. Junejo, E. Dexter, I. Laptev, and P. Pérez, “Cross-View Action Recognition from Temporal Self-Similarities,” Proc. 10th European Conf. Computer Vision, pp. 293-306, 2008.
[21] K. Schindler and L.V. Gool, “Action Snippets: How Many Frames Does Human Action Recognition Require?” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 1-8, 2008.
[22] A. Kovashka and K. Grauman, “Learning a Hierarchy of Discriminative Space-Time Neighborhood Features for Human Action Recognition,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 2046-2053, 2010.
[23] M. Raptis and S. Soatto, “Tracklet Descriptors for Action Modeling and Video Analysis,” Proc. 11th European Conf. Computer Vision, pp. 577-590, 2010.
[24] W. Kim, J. Lee, M. Kim, D. Oh, and C. Kim, “Human Action Recognition Using Ordinal Measure of Accumulated Motion,” EURASIP J. Advances in Signal Processing, vol. 2010, pp. 1-11, 2010.
[25] D. Weinland, M. Ozuysal, and P. Fua, “Making Action Recognition Robust to Occlusions and Viewpoint Changes,” Proc. 11th European Conf. Computer Vision, pp. 635-648, 2010.
[26] A. Gilbert, J. Illingworth, and R. Bowden, “Action Recognition Using Mined Hierarchical Compound Features,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 33, no. 5, pp. 883-897, May 2011.
[27] H. Wang, A. Klaser, C. Schmid, and C.-L. Liu, “Action Recognition by Dense Trajectories,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 3169-3176, 2011.
[28] A. Gaidon, M. Marszalek, and C. Schmid, “Mining Visual Actions from Movies,” Proc. British Machine Vision Conf., p. 128, 2009.
[29] N. Ikizler and D.A. Forsyth, “Searching for Complex Human Activities with No Visual Examples,” Int'l J. Computer Vision, vol. 80, no. 3, pp. 337-357, 2008.
[30] S. Zanetti, L. Zelnik-Manor, and P. Perona, “A Walk through the Webs Video Clips,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition Workshops, pp. 1-8, 2008.
[31] Z. Wang, M. Zhao, Y. Song, S. Kumar, and B. Li, “Youtubecat: Learning to Categorize Wild Web Videos,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010.
[32] L. Duan, D. Xu, I.W. Tsang, and J. Luo, “Visual Event Recognition in Videos by Learning from Web Data,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010.
[33] T.S. Chua, S. Tang, R. Trichet, H.K. Tan, and Y. Song, “Moviebase: A Movie Database for Event Detection and Behavioral Analysis,” Proc. First Workshop Web-Scale Multimedia Corpus, pp. 41-48, 2009.
[34] N. Ikizler-Cinbis and S. Sclaroff, “Object, Scene and Actions: Combining Multiple Features for Human Action Recognition,” Proc. 11th European Conf. Computer Vision, pp. 494-507, 2010.
[35] P. Matikainen, M. Hebert, and R. Sukthankar, “Representing Pairwise Spatial and Temporal Relations for Action Recognition,” Proc. 11th European Conf. Computer Vision, pp. 508-521, 2010.
[36] L. Zelnik-Manor and M. Irani, “Statistical Analysis of Dynamic Actions,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 9, pp. 1530-1535, Sept. 2006.
[37] E. Shechtman and M. Irani, “Matching Local Self-Similarities across Images and Videos,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 1-8, 2007.
[38] A. Farhadi and M. Tabrizi, “Learning to Recognize Activities from the Wrong View Point,” Proc. 10th European Conf. Computer Vision, pp. 154-166, 2008.
[39] J. Xiao, J. Hays, K. Ehinger, A. Oliva, and A. Torralba, “Sun Database: Large-Scale Scene Recognition from Abbey to Zoo,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 3485-3492, 2010.
[40] L. Wolf, R. Littman, N. Mayer, T. German, N. Dershowitz, R. Shweka, and Y. Choueka, “Identifying Join Candidates in the Cairo Genizah,” Int'l J. Computer Vision, vol. 94, pp. 118-135, 2011.
[41] L. Wolf, T. Hassner, and I. Maoz, “Face Recognition in Unconstrained Videos with Matched Background Similarity,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2011.
[42] A. Ferencz, E. Learned-Miller, and J. Malik, “Building a Classification Cascade for Visual Identification from One Example,” Proc. 10th IEEE Int'l Conf. Computer Vision, vol. 1, pp. 286-293, 2005.
[43] L. Wolf, T. Hassner, and Y. Taigman, “Descriptor Based Methods in the Wild,” Proc. Faces in Real-Life Images Workshop in European Conf. Computer Vision, 2008.
[44] M. Sargin, H. Aradhye, P. Moreno, and M. Zhao, “Audiovisual Celebrity Recognition in Unconstrained Web Videos,” Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, pp. 1977-1980, 2009.
[45] C.-C. Chang and C.-J. Lin, “LIBSVM: A Library for Support Vector Machines,” ACM Trans. Intelligent Systems Technology, vol. 2, no. 3, pp. 27:1-27:27, http://www.csie.ntu.edu.tw/ cjlinlibsvm, 2011.
[46] D.H. Wolpert, “Stacked Generalization,” Neural Networks, vol. 5, no. 2, pp. 241-259, 1992.

Index Terms:
Web services,computer vision,pattern recognition,protocols,video databases,descriptor encodings,action recognition,video collections,benchmark protocols,video database,action similarity labeling,ASLAN database,Web service,action classes,Videos,Databases,Benchmark testing,YouTube,Training,Cameras,benchmark.,Action recognition,action similarity,video database,web videos
Citation:
T. Hassner, O. Kliper-Gross, L. Wolf, "The Action Similarity Labeling Challenge," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 3, pp. 615-621, March 2012, doi:10.1109/TPAMI.2011.209
Usage of this product signifies your acceptance of the Terms of Use.