The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.11 - November (2008 vol.30)
pp: 1958-1970
Antonio Torralba , MIT, Cambridge
Rob Fergus , New York University, New York
William T. Freeman , MIT, Cambridge
ABSTRACT
With the advent of the Internet, billions of images are now freely available online and constitute a dense sampling of the visual world. Using a variety of non-parametric methods, we explore this world with the aid of a large dataset of 79,302,017 images collected from the Internet. Motivated by psychophysical results showing the remarkable tolerance of the human visual system to degradations in image resolution, the images in the dataset are stored as 32 x 32 color images. Each image is loosely labeled with one of the 75,062 non-abstract nouns in English, as listed in the Wordnet lexical database. Hence the image database gives a comprehensive coverage of all object categories and scenes. The semantic information from Wordnet can be used in conjunction with nearest-neighbor methods to perform object classification over a range of semantic levels minimizing the effects of labeling noise. For certain classes that are particularly prevalent in the dataset, such as people, we are able to demonstrate a recognition performance comparable to class-specific Viola-Jones style detectors.
INDEX TERMS
Computer vision, Object recognition, large datasets, nearest-neighbor methods
CITATION
Antonio Torralba, Rob Fergus, William T. Freeman, "80 Million Tiny Images: A Large Data Set for Nonparametric Object and Scene Recognition", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.30, no. 11, pp. 1958-1970, November 2008, doi:10.1109/TPAMI.2008.128
REFERENCES
[1] T. Bachmann, “Identification of Spatially Quantized Tachistoscopic Images of Faces: How Many Pixels Does It Take to Carry Identity,” European J. Cognitive Psychology, vol. 3, pp. 85-103, 1991.
[2] M. Banko and E. Brill, “Scaling to Very Very Large Corpora for Natural Language Disambiguation,” Proc. 39th Ann. Meeting on Assoc. for Computational Linguistics, pp. 26-33, 2001.
[3] K. Barnard, P. Duygulu, N. de Freitas, D. Forsyth, D. Blei, and M. Jordan, “Matching Words and Pictures,” J. Machine Learning Research, vol. 3, pp. 1107-1135, 2003.
[4] S. Belongie, J. Malik, and J. Puzicha, “Shape Context: A New Descriptor for Shape Matching and Object Recognition,” Proc. Advances in Neural Information and Processing Systems, pp. 831-837, 2000.
[5] A. Berg, T. Berg, and J. Malik, “Shape Matching and Object Recognition Using Low Distortion Correspondence,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 26-33, June 2005.
[6] T. Berg, A. Berg, J. Edwards, M. Maire, R. White, Y-W. Teh, E. Learned-Miller, and D. Forsyth, “Names and Faces in the News,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, vol. 2, pp. 848-854, 2004.
[7] T.L. Berg and D.A. Forsyth, “Animals on the Web,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, vol. 2, pp. 1463-1470, 2006.
[8] P. Carbonetto, N. de Freitas, and K. Barnard, “A Statistical Model for General Contextual Object Recognition,” Proc. European Conf. Computer Vision, vol. 1, pp. 350-362, 2004.
[9] C. Carson, S. Belongie, H. Greenspan, and J. Malik, “Blobworld: Image Segmentation Using Expectation-Maximization and Its Application to Image Querying,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 8, pp. 1026-1038, Aug. 2002.
[10] D.M. Chandler and D.J. Field, “Estimates of the Information Content and Dimensionality of Natural Scenes from Proximity Distributions,” J. Optical Soc. Am., vol. 24, pp. 922-941, 2006.
[11] T. Cour, F. Benezit, and J. Shi, “Spectral Segmentation with Multiscale Graph Decomposition,” Computer Vision and Pattern Recognition, vol. 2, pp. 1124-1131, 2005.
[12] R. Datta, D. Joshi, J. Li, and J.Z. Wang, “Image Retrieval: Ideas, Influences, and Trends of the New Age,” ACM Computing Surveys, vol. 40, no. 2, 2008.
[13] M. Everingham, A. Zisserman, C.K.I. Williams, and L. Van Gool, “The PASCAL Visual Object Classes Challenge 2006 (VOC 2006) Results,” technical report, Univ. of Oxford, Sept. 2006.
[14] L. Fei-Fei and P. Perona, “A Bayesian Hierarchical Model for Learning Natural Scene Categories,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, pp. 524-531, 2005.
[15] C. Fellbaum, Wordnet: An Electronic Lexical Database. Bradford Books, 1998.
[16] R. Fergus, P. Perona, and A. Zisserman, “A Visual Category Filter for Google Images,” Proc. European Conf. Computer Vision, pp. 242-256, May 2004.
[17] M. Flickner, H. Sawhney, W. Niblack, J. Ashley, Q. Huang, B. Dom, M. Gorkani, J. Hafner, D. Lee, D. Petkovic, D. Steele, and P. Yanker, “Query by Image and Video Content: The QBIC System,” Computer, vol. 28, no. 9, pp. 23-32, Sept. 1995.
[18] M.M. Gorkani and R.W. Picard, “Texture Orientation for Sorting Photos at a Glance,” Proc. Int'l Conf. Pattern Recognition, vol. 1, pp.459-464, 1994.
[19] K. Grauman and T. Darrell, “Pyramid Match Hashing: Sub-Linear Time Indexing over Partial Correspondences,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, pp. 1-8, 2007.
[20] G. Griffin, A. Holub, and P. Perona, “Caltech-256 Object Category Dataset,” Technical Report UCB/CSD-04-1366, 2007.
[21] L.D. Harmon and B. Julesz, “Masking in Visual Recognition: Effects of Two-Dimensional Noise,” Science, vol. 180, pp. 1194-1197, 1973.
[22] J. Hays and A.A. Efros, “Scene Completion Using Millions of Photographs,” Proc. ACM SIGGRAPH '07, vol. 26, 2007.
[23] A. Hoogs and R. Collins, “Object Boundary Detection in Images Using a Semantic Ontology,” Proc. Nat'l Conf. Artificial Intelligence, 2006.
[24] S. Lazebnik, C. Schmid, and J. Ponce, “Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, pp. 2169-2178, 2006.
[25] A.B. Lee, K.S. Pedersen, and D. Mumford, “The Nonlinear Statistics of High-Contrast Patches in Natural Images,” Int'l J. Computer Vision, vol. 54, nos. 1-3, pp. 83-103, 2003.
[26] J. Li, G. Wang, and L. Fei-Fei, “Optimol: Automatic Object Picture Collection via Incremental Model Learning,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, pp. 1-8, 2007.
[27] R. Lienhart, A. Kuranov, and V. Pisarevsky, “Empirical Analysis of Detection Cascades of Boosted Classifiers for Rapid Object Detection,” Proc. DAGM 25th Pattern Recognition Symp., pp. 297-304, 2003.
[28] D.G. Lowe, “Object Recognition from Local Scale-Invariant Features,” Proc. IEEE Int'l Conf. Computer Vision, pp. 1150-1157, 1999.
[29] B.D. Lucas and T. Kanade, “An Iterative Image Registration Technique with an Application to Stereo Vision,” Proc. Imaging Understanding Workshop, pp. 121-130, 1981.
[30] D. Nister and H. Stewenius, “Scalable Recognition with a Vocabulary Tree,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, pp. 2161-2168, 2006.
[31] A. Oliva and P.G. Schyns, “Diagnostic Colors Mediate Scene Recognition,” Cognitive Psychology, vol. 41, pp. 176-210, 1976.
[32] A. Oliva and A. Torralba, “Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope,” Int'l J. Computer Vision, vol. 42, pp. 145-175, 2001.
[33] T. Quack, U. Monich, L. Thiele, and B. Manjunath, “Cortina: A System for Large-Scale, Content-Based Web Image Retrieval,” Proc. ACM Multimedia, Oct. 2004.
[34] B. Russell, A. Torralba, C. Liu, R. Fergus, and W.T. Freeman, “Object Recognition by Scene Alignment,” Proc. Advances in Neural Information and Processing Systems, 2007.
[35] B. Russell, A. Torralba, K.P. Murphy, and W.T. Freeman, “Labelme: A Database and Web-Based Tool for Image Annotation,” Int'l J. Computer Vision, vol. 77, nos. 1-3, pp. 157-173, 2007.
[36] G. Shakhnarovich, P. Viola, and T. Darrell, “Fast Pose Estimation with Parameter Sensitive Hashing,” Proc. IEEE Int'l Conf. Computer Vision, vol. 2, pp. 750-757, 2003.
[37] N. Snavely, S.M. Seitz, and R. Szeliski, “Photo Tourism: Exploring Photo Collections in 3D,” ACM Trans. Graphics, vol. 25, no. 3, pp.137-154, 2006.
[38] M. Spain and P. Perona, “Measuring and Predicting Importance of Objects in Our Visual World,” Technical Report 9139, California Inst. of Tech nology, 2007.
[39] A. Torralba, R. Fergus, and W.T. Freeman, “Tiny Images,” Technical Report MIT-CSAIL-TR-2007-024, Computer Science and Artificial Intelligence Laboratory, Massachusetts Inst. of Tech nology, 2007.
[40] A. Torralba, R. Fergus, and Y. Weiss, “Small Codes and Large Databases for Recognition,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, June 2008.
[41] P. Viola and M. Jones, “Rapid Object Detection Using a Boosted Cascade of Simple Classifiers,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 511-518, 2001.
[42] J. Wang, G. Wiederhold, O. Firschein, and S. Wei, “Content-Based Image Indexing and Searching Using Daubechies' Wavelets,” Int'l J. Digital Libraries, vol. 1, pp. 311-328, 1998.
[43] C.S. Zhu, N.Y. Wu, and D. Mumford, “Minimax Entropy Principle and Its Application to Texture Modeling,” Neural Computation, vol. 9, no. 8, Nov. 1997.
5 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool