The Community for Technology Leaders
2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (2008)
Anchorage, AK, USA
June 23, 2008 to June 28, 2008
ISBN: 978-1-4244-2339-2
pp: 1-6
I. Patras , Electronic Engineering Department, Queen Mary University of London, UK
A. Oikonomopoulos , Computing Department Imperial College London, UK
M. Pantic , Computing Department Imperial College London, UK
ABSTRACT
The extraction and quantization of local image and video descriptors for the subsequent creation of visual codebooks is a technique that has proved extremely effective for image and video retrieval applications. In this paper we build on this concept and extract a new set of visual descriptors that are derived from spatiotemporal salient points detected on given image sequences and provide local space-time description of the visual activity. The proposed descriptors are based on the geometrical properties of three-dimensional piecewise polynomials, namely B-splines, that are fitted on the spatiotemporal locations of the salient points that are engulfed within a given spatiotemporal neighborhood. Our descriptors are inherently translation invariant, while the use of the scales of the salient points for the definition of the neighborhood dimensions ensures space-time scaling invariance. Subsequently, a clustering algorithm is used in order to cluster our descriptors across the whole dataset and create a codebook of visual verbs, where each verb corresponds to a cluster center. We use the resulting codebook in a ‘bag of verbs’ approach in order to recover the pose and short-term motion of subjects at a short set of successive frames, and we use Dynamic Time Warping (DTW) in order to align the sequences in our dataset and structure in time the recovered poses. We define a kernel based on the similarity measure provided by the DTW to classify our examples in a Relevane Vector Machine classification scheme. We present results in a well established human
INDEX TERMS
CITATION
I. Patras, A. Oikonomopoulos, M. Pantic, "B-spline polynomial descriptors for human activity recognition", 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, vol. 00, no. , pp. 1-6, 2008, doi:10.1109/CVPRW.2008.4563175
92 ms
(Ver )