2007 IEEE Conference on Computer Vision and Pattern Recognition (2007)
Minneapolis, MN, USA
June 17, 2007 to June 22, 2007
Ariadna Quattoni , MIT Computer Science and Artificial Intelligence Laboratory. firstname.lastname@example.org
Michael Collins , MIT Computer Science and Artificial Intelligence Laboratory. email@example.com
Trevor Darrell , MIT Computer Science and Artificial Intelligence Laboratory. firstname.lastname@example.org
Current methods for learning visual categories work well when a large amount of labeled data is available, but can run into severe difficulties when the number of labeled examples is small. When labeled data is scarce it may be beneficial to use unlabeled data to learn an image representation that is low-dimensional, but nevertheless captures the information required to discriminate between image categories. This paper describes a method for learning representations from large quantities of unlabeled images which have associated captions; the goal is to improve learning in future image classification problems. Experiments show that our method significantly outperforms (1) a fully-supervised baseline model, (2) a model that ignores the captions and learns a visual representation by performing PCA on the unlabeled images alone and (3) a model that uses the output of word classifiers trained using captions and unlabeled data. Our current work concentrates on captions as the source of meta-data, but more generally other types of meta-data could be used.
T. Darrell, A. Quattoni and M. Collins, "Learning Visual Representations using Images with Captions," 2007 IEEE Conference on Computer Vision and Pattern Recognition(CVPR), Minneapolis, MN, USA, 2007, pp. 1-8.