Oct. 13, 2003 to Oct. 16, 2003
Josef Sivic , University of Oxford, United Kingdom
Andrew Zisserman , University of Oxford, United Kingdom
We describe an approach to object and scene retrieval which searches for and localizes all the occurrences of a user outlined object in a video. The object is represented by a set of viewpoint invariant region descriptors so that recognition can proceed successfully despite changes in viewpoint, illumination and partial occlusion. The temporal continuity of the video within a shot is used to track the regions in order to reject unstable regions and reduce the effects of noise in the descriptors.<div></div> The analogy with text retrieval is in the implementation where matches on descriptors are pre-computed (using vector quantization), and inverted file systems and document rankings are used. The result is that retrieval is immediate, returning a ranked list of key frames/shots in the manner of Google.<div></div> The method is illustrated for matching on two full length feature films.
Josef Sivic, Andrew Zisserman, "Video Google: A Text Retrieval Approach to Object Matching in Videos", ICCV, 2003, Computer Vision, IEEE International Conference on, Computer Vision, IEEE International Conference on 2003, pp. 1470, doi:10.1109/ICCV.2003.1238663