Wongun Choi , NEC Laboratories, USA
Silvio Savarese , Stanford University, Stanford
This paper presents a principled framework for analyzing collective activities at different levels of semantic granularity from videos. Our framework is capable of jointly tracking multiple individuals, recognizing activities performed by individuals in isolation (i.e., atomic activities such as walking or standing), recognizing the interactions between pairs of individuals (i.e., interaction activities) as well as understanding the activities of group of individuals (i.e., collective activities). A key property of our work is that it can coherently combine bottom-up information stemming from detections or fragments of tracks (or tracklets) with top-down evidence. Top-down evidence is provided by a newly proposed descriptor that captures the coherent behavior of groups of individuals in a spatial-temporal neighborhood of the sequence. Top-down evidence provides contextual information for establishing accurate associations between detections or tracklets across frames and, thus, for obtaining more robust tracking results. Bottom-up evidence percolates upwards so as to automatically infer collective activity labels. Experimental results on two challenging datasets demonstrate our theoretical claims and indicate that our model achieves enhances tracking results and the best collective classification results to date.
Computer vision, Vision and Scene Understanding, Video analysis, Tracking, Structural
W. Choi and S. Savarese, "Understanding Collective Activities of People from Videos," in IEEE Transactions on Pattern Analysis & Machine Intelligence.