This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Harvesting Image Databases from the Web
April 2011 (vol. 33 no. 4)
pp. 754-766
Florian Schroff, University of California, San Diego, San Diego
Antonio Criminisi, Microsoft Research Cambridge, Cambridge
Andrew Zisserman, University of Oxford, Oxford
The objective of this work is to automatically generate a large number of images for a specified object class. A multimodal approach employing both text, metadata, and visual features is used to gather many high-quality images from the Web. Candidate images are obtained by a text-based Web search querying on the object identifier (e.g., the word penguin). The Webpages and the images they contain are downloaded. The task is then to remove irrelevant images and rerank the remainder. First, the images are reranked based on the text surrounding the image and metadata features. A number of methods are compared for this reranking. Second, the top-ranked images are used as (noisy) training data and an SVM visual classifier is learned to improve the ranking further. We investigate the sensitivity of the cross-validation procedure to this noisy training data. The principal novelty of the overall method is in combining text/metadata and visual features in order to achieve a completely automatic ranking of the images. Examples are given for a selection of animals, vehicles, and other classes, totaling 18 classes. The results are assessed by precision/recall curves on ground-truth annotated data and by comparison to previous approaches, including those of Berg and Forsyth [CHECK END OF SENTENCE] and Fergus et al. [CHECK END OF SENTENCE].

[1] J. Aslam and M. Montague, "Models for Metasearch," Proc. ACM Conf. Research and Development in Information Retrieval, pp. 276-284, 2001.
[2] K. Barnard, P. Duygulu, N. de Freitas, D. Forsyth, D. Blei, and M. Jordan, "Matching Words and Pictures," J. Machine Learning Research, vol. 3, pp. 1107-1135, Feb. 2003.
[3] T. Berg, "Animals on the Web Data Set," http://www.tamaraberg. com/animalDataset index.html, 2006.
[4] T. Berg, A. Berg, J. Edwards, M. Mair, R. White, Y. Teh, E. Learned-Miller, and D. Forsyth, "Names and Faces in the News," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2004.
[5] T.L. Berg and D.A. Forsyth, "Animals on the Web," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006.
[6] D. Blei, A. Ng, and M. Jordan, "Latent Dirichlet Allocation," J. Machine Learning Research, vol. 3, pp. 993-1022, Jan. 2003.
[7] C.K. Chow and C.N. Liu, "Approximating Discrete Probability Distributions with Dependence Trees," IEEE Information Theory, vol. 14, no. 3, pp. 462-467, May 1968.
[8] B. Collins, J. Deng, K. Li, and L. Fei-Fei, "Towards Scalable Data Set Construction: An Active Learning Approach," Proc. 10th European Conf. Computer Vision, 2008.
[9] N. Dalal and B. Triggs, "Histogram of Oriented Gradients for Human Detection," Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 2, pp. 886-893, 2005.
[10] G. Dorkó and C. Schmid, "Selection of Scale-Invariant Parts for Object Class Recognition," Proc. Ninth Int'l Conf. Computer Vision, 2003.
[11] R. Fergus, P. Perona, and A. Zisserman, "A Visual Category Filter for Google Images," Proc. Eighth European Conf. Computer Vision, May 2004.
[12] R. Fergus, L. Fei-Fei, P. Perona, and A. Zisserman, "Learning Object Categories from Google's Image Search," Proc. 10th Int'l Conf. Computer Vision, 2005.
[13] C. Frankel, M.J. Swain, and V. Athitsos, "Webseer: An Image Search Engine for the World Wide Web," technical report, Univ. of Chicago, 1997.
[14] M. Fritz and B. Schiele, "Decomposition, Discovery and Detection of Visual Categories Using Topic Models," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[15] T. Hofmann, "Unsupervised Learning by Probabilistic Latent Semantic Analysis," Machine Learning, vol. 43, pp. 177-196, 2001.
[16] T. Hofmann, "Probabilistic Latent Semantic Analysis," Proc. Conf. Uncertainty in Artificial Intelligence, 1999.
[17] T. Joachims, "${\rm SVM}^{{\rm light}}$ ," http:/svmlight.joachims.org/, 2010.
[18] T. Kadir, A. Zisserman, and M. Brady, "An Affine Invariant Salient Region Detector," Proc. Eighth European Conf. Computer Vision, May 2004.
[19] J. Li, G. Wang, and L. Fei-Fei, "OPTIMOL: Automatic Object Picture Collection via Incremental Model Learning," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007.
[20] W.-H. Lin, R. Jin, and A. Hauptmann, "Web Image Retrieval Re-Ranking with Relevance Model," Proc. IADIS Int'l Conf. WWW/Internet, 2003.
[21] D. Lowe, "Object Recognition from Local Scale-Invariant Features," Proc. Seventh IEEE Int'l Conf. Computer Vision, pp. 1150-1157, Sept. 1999.
[22] K. Mikolajczyk, C. Schmid, and A. Zisserman, "Human Detection Based on a Probabilistic Assembly of Robust Part Detectors," Proc. Eighth European Conf. Computer Vision, May 2004.
[23] K. Morik, P. Brockhausen, and T. Joachims, "Combining Statistical Learning with a Knowledge-Based Approach—A Case Study in Intensive Care Monitoring," Proc. 16th Int'l Conf. Machine Learning, 1999.
[24] Onix, "ONIX Text Retrieval Toolkit," http://www.lextek.com/manuals/onixstopwords1.html , 2010.
[25] M. Porter, R. Boulton, and A. Macfarlane, "The English (Porter2) Stemming Algorithm," http:/snowball.tartarus.org/, 2002.
[26] K. Saenko and T. Darrell, "Unsupervised Learning of Visual Sense Models for Polysemous Words," Proc. Conf. Advances in Neural Information Processing Systems, 2008.
[27] F. Schroff, "Semantic Image Segmentation and Web-Supervised Visual Learning," DPhil thesis, Univ. of Oxford, 2009.
[28] F. Schroff, A. Criminisi, and A. Zisserman, "Harvesting Image Databases from the Web," Proc. 11th Int'l Conf. Computer Vision, 2007.
[29] F. Schroff, A. Criminisi, and A. Zisserman, "Harvesting Image Databases from the Web," http://www.robots.ox.ac.uk/~vgg/datamkdb , 2007.
[30] Y. Teh, M. Jordan, M. Beal, and D. Blei, "Hierarchical Dirichlet Processes," Technical Report 653, 2003.
[31] A. Torralba, "Contextual Priming for Object Detection," Int'l J. Computer Vision, vol. 53, no. 2, pp. 153-167, 2003.
[32] VGG, "Affine Covariant Features," http://www.robots.ox.ac.uk/~vgg/research/ affineindex.html, 2010.
[33] S. Vijayanarasimhan and K. Grauman, "Keywords to Visual Categories: Multiple-Instance Learning for Weakly Supervised Object Categorization," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[34] G. Wang and D. Forsyth, "Object Image Retrieval by Exploiting Online Knowledge Resources," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 1-8, 2008.
[35] J. Zhang, M. Marszalek, S. Lazebnik, and C. Schmid, "Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study," Int'l J. Computer Vision, vol. 73, no. 2, pp. 213-238, 2007.

Index Terms:
Weakly supervised, computer vision, object recognition, image retrieval.
Citation:
Florian Schroff, Antonio Criminisi, Andrew Zisserman, "Harvesting Image Databases from the Web," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 4, pp. 754-766, April 2011, doi:10.1109/TPAMI.2010.133
Usage of this product signifies your acceptance of the Terms of Use.