Computer Vision, IEEE International Conference on (2003)
Oct. 13, 2003 to Oct. 16, 2003
Josef Sivic , University of Oxford, United Kingdom
Andrew Zisserman , University of Oxford, United Kingdom
We describe an approach to object and scene retrieval which searches for and localizes all the occurrences of a user outlined object in a video. The object is represented by a set of viewpoint invariant region descriptors so that recognition can proceed successfully despite changes in viewpoint, illumination and partial occlusion. The temporal continuity of the video within a shot is used to track the regions in order to reject unstable regions and reduce the effects of noise in the descriptors.<div></div> The analogy with text retrieval is in the implementation where matches on descriptors are pre-computed (using vector quantization), and inverted file systems and document rankings are used. The result is that retrieval is immediate, returning a ranked list of key frames/shots in the manner of Google.<div></div> The method is illustrated for matching on two full length feature films.
Josef Sivic, Andrew Zisserman, "Video Google: A Text Retrieval Approach to Object Matching in Videos", Computer Vision, IEEE International Conference on, vol. 02, no. , pp. 1470, 2003, doi:10.1109/ICCV.2003.1238663