The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.03 - March (2014 vol.36)
pp: 453-465
Christoph H. Lampert , Inst. of Sci. & Technol. Austria, Klosterneuburg, Austria
Hannes Nickisch , Philips Res., Hamburg, Germany
Stefan Harmeling , Max Planck Inst. for Intell. Syst., Tubingen, Germany
ABSTRACT
We study the problem of object recognition for categories for which we have no training examples, a task also called zero--data or zero-shot learning. This situation has hardly been studied in computer vision research, even though it occurs frequently; the world contains tens of thousands of different object classes, and image collections have been formed and suitably annotated for only a few of them. To tackle the problem, we introduce attribute-based classification: Objects are identified based on a high-level description that is phrased in terms of semantic attributes, such as the object's color or shape. Because the identification of each such property transcends the specific learning task at hand, the attribute classifiers can be prelearned independently, for example, from existing image data sets unrelated to the current task. Afterward, new classes can be detected based on their attribute representation, without the need for a new training phase. In this paper, we also introduce a new data set, Animals with Attributes, of over 30,000 images of 50 animal classes, annotated with 85 semantic attributes. Extensive experiments on this and two more data sets show that attribute-based classification indeed is able to categorize images without access to any training images of the target classes.
INDEX TERMS
Training, Semantics, Vectors, Computer vision, Marine animals, Probabilistic logic,vision and scene understanding, Object recognition
CITATION
Christoph H. Lampert, Hannes Nickisch, Stefan Harmeling, "Attribute-Based Classification for Zero-Shot Visual Object Categorization", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.36, no. 3, pp. 453-465, March 2014, doi:10.1109/TPAMI.2013.140
REFERENCES
[1] D.G. Lowe, "Distinctive Image Features from Scale-Invariant Keypoints," Int'l J. Computer Vision, vol. 60, no. 2, pp. 91-110, 2004.
[2] N. Dalal and B. Triggs, "Histograms of Oriented Gradients for Human Detection," Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2005.
[3] B. Schölkopf and A.J. Smola, Learning with Kernels. MIT Press, 2002.
[4] C.H. Lampert, "Kernel Methods in Computer Vision," Foundations and Trends in Computer Graphics and Vision, vol. 4, no. 3, pp. 193-285, 2009.
[5] R.E. Schapire and Y. Freund, Boosting: Foundations and Algorithms. MIT Press, 2012.
[6] I. Biederman, "Recognition by Components: A Theory of Human Image Understanding," Psychological Rev., vol. 94, no. 2, pp. 115-147, 1987.
[7] B. Yao, A. Khosla, and L. Fei-Fei, "Combining Randomization and Discrimination for Fine-Grained Image Categorization," Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2011.
[8] G.L. Murphy, The Big Book of Concepts. MIT Press, 2004.
[9] C.H. Lampert, H. Nickisch, and S. Harmeling, "Learning to Detect Unseen Object Classes by Between-Class Attribute Transfer," Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2009.
[10] D. Parikh and K. Grauman, "Relative Attributes," Proc. IEEE Int'l Conf. Computer Vision (ICCV), 2011.
[11] D.E. Knuth, "Two Notes on Notation," Am. Math. Monthly, vol. 99, no. 5, pp. 403-422, 1992.
[12] D.E. Rumelhart, G.E. Hinton, and R.J. Williams, "Learning Internal Representations by Error Propagation," Parallel Distributed Processing, MIT Press, 1986.
[13] L. Breiman, J.J. Friedman, R.A. Olshen, and C.J. Stone, Classification and Regression Trees. Wadsworth, 1984.
[14] M.I. Jordan and R.A. Jacobs, "Hierarchical Mixtures of Experts and the EM Algorithm," Neural Computation, vol. 6, no. 2, pp. 181-214, 1994.
[15] Y. Freund and R.E. Schapire, "A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting," J. Computer and System Sciences, vol. 55, no. 1, pp. 119-139, 1997.
[16] T.G. Dietterich and G. Bakiri, "Solving Multiclass Learning Problems via Error-Correcting Output Codes," J. Artificial Intelligence Research, vol. 2, pp. 263-286, 1995.
[17] R. Rifkin and A. Klautau, "In Defense of One-vs-All Classification," J. Machine Learning Research, vol. 5, pp. 101-141, 2004.
[18] M. Ranzato, F.J. Huang, Y.-L. Boureau, and Y. LeCun, "Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition," Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2007.
[19] J. Winn and N. Jojic, "LOCUS: Learning Object Classes with Unsupervised Segmentation," Proc. IEEE Int'l Conf. Computer Vision (ICCV), vol. 1, 2005.
[20] M.A. Fischler and R.A. Elschlager, "The Representation and Matching of Pictorial Structures," IEEE Trans. Computers, vol. 22, no. 1, pp. 67-92, Jan. 1973.
[21] R. Fergus, P. Perona, and A. Zisserman, "Object Class Recognition by Unsupervised Scale-Invariant Learning," Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2003.
[22] P.F. Felzenszwalb, D. McAllester, and D. Ramanan, "A Discriminatively Trained, Multiscale, Deformable Part Model," Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2008.
[23] J.C. Platt, N. Cristianini, and J. Shawe-Taylor, "Large Margin DAGs for Multiclass Classification," Proc. Advances in Neural Information Processing Systems (NIPS), 1999.
[24] A. Torralba and K.P. Murphy, "Sharing Visual Features for Multiclass and Multiview Object Detection," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 5, pp. 854-869, May 2007.
[25] P. Zehnder, E.K. Meier, and L.J.V. Gool, "An Efficient Shared Multi-Class Detection Cascade," Proc. British Machine Vision Conf. (BMVC), 2008.
[26] E. Miller, N. Matsakis, and P. Viola, "Learning from One Example through Shared Densities on Transforms," Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2000.
[27] F.F. Li, R. Fergus, and P. Perona, "One-Shot Learning of Object Categories," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 4, pp. 594-611, Apr. 2006.
[28] E. Bart and S. Ullman, "Cross-Generalization: Learning Novel Classes from a Single Example by Feature Replacement," Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2005.
[29] H. Larochelle, D. Erhan, and Y. Bengio, "Zero-Data Learning of New Tasks," Proc. 23rd Nat'l Conf. Artificial Intelligence, vol. 1, no. 2, pp. 646-651, 2008.
[30] K. Yanai and K. Barnard, "Image Region Entropy: A Measure of Visualness of Web Images Associated with One Concept," Proc. 13th Ann. ACM Int'l Conf. Multimedia, pp. 419-422, 2005.
[31] J. Van De Weijer, C. Schmid, J. Verbeek, and D. Larlus, "Learning Color Names for Real-World Applications," IEEE Trans. Image Processing, vol. 18, no. 7, pp. 1512-1523, July 2009.
[32] V. Ferrari and A. Zisserman, "Learning Visual Attributes," Proc. Advances in Neural Information Processing Systems (NIPS), 2008.
[33] A.F. Smeaton, P. Over, and W. Kraaij, "Evaluation Campaigns and TRECVid," Proc. Eighth ACM Int'l Workshop Multimedia Information Retrieval, 2006.
[34] N. Kumar, P.N. Belhumeur, and S.K. Nayar, "Facetracer: A Search Engine for Large Collections of Images with Faces," Proc. European Conf. Computer Vision (ECCV), 2008.
[35] N. Kumar, A. Berg, P. Belhumeur, and S. Nayar, "Describable Visual Attributes for Face Verification and Image Search," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 33, no. 10, pp. 1962-1977, Oct. 2011.
[36] D. Parikh and K. Grauman, "Interactively Building a Discriminative Vocabulary of Nameable Attributes," Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), pp. 1681-1688, 2011.
[37] V. Sharmanska, N. Quadrianto, and C.H. Lampert, "Augmented Attribute Representations" Proc. European Conf. Computer Vision (ECCV), 2012.
[38] K. Yanai and K. Barnard, "Image Region Entropy: A Measure of 'Visualness' of Web Images Associated with One Concept," Proc. 13th Ann. ACM Int'l Conf. Multimedia, 2005.
[39] J. Wang, K. Markert, and M. Everingham, "Learning Models for Object Recognition from Natural Language Descriptions," Proc. British Machine Vision Conf. (BMVC), 2009.
[40] T.L. Berg, A.C. Berg, and J. Shih, "Automatic Attribute Discovery and Characterization from Noisy Web Images," Proc. European Conf. Computer Vision (ECCV), 2010.
[41] L.J. Li, H. Su, Y. Lim, and L. Fei-Fei, "Objects as Attributes for Scene Classification," Proc. First Int'l Workshop Parts and Attributes at European Conf. Computer Vision, 2010.
[42] Y. Wang and G. Mori, "A Discriminative Latent Model of Object Classes and Attributes," Proc. European Conf. Computer Vision (ECCV), pp. 155-168, 2010.
[43] X. Yu and Y. Aloimonos, "Attribute-Based Transfer Learning for Object Categorization with Zero/One Training Example," Proc. European Conf. Computer Vision (ECCV), pp. 127-140, 2010.
[44] W.J. Scheirer, N. Kumar, P.N. Belhumeur, and T.E. Boult, "Multi-Attribute Spaces: Calibration for Attribute Fusion and Similarity Search," Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2012.
[45] A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth, "Describing Objects by their Attributes," Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2009.
[46] G. Patterson and J. Hays, "SUN Attribute Database: Discovering, Annotating, and Recognizing Scene Attributes," Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2012.
[47] J. Liu, B. Kuipers, and S. Savarese, "Recognizing Human Actions by Attributes," Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2011.
[48] R. Feris, B. Siddiquie, Y. Zhai, J. Petterson, L. Brown, and S. Pankanti, "Attribute-Based Vehicle Search in Crowded Surveillance Videos," Proc. ACM Int'l Conf. Multimedia Retrieval (ICMR), article 18, 2011.
[49] M. Rohrbach, M. Stark, G. Szarvas, I. Gurevych, and B. Schiele, "What Helps Where—and Why? Semantic Relatedness for Knowledge Transfer," Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2010.
[50] G. Kulkarni, V. Premraj, S. Dhar, S. Li, Y. Choi, A.C. Berg, and T.L. Berg, "Baby Talk: Understanding and Generating Simple Image Descriptions," Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), pp. 1601-1608, 2011.
[51] A. Kovashka, D. Parikh, and K. Grauman, "Whittlesearch: Image Search with Relative Attribute Feedback," Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2012.
[52] A. Parkash and D. Parikh, "Attributes for Classifier Feedback," Proc. European Conf. Computer Vision (ECCV), 2012.
[53] D. Osherson, E.E. Smith, T.S. Myers, E. Shafir, and M. Stob, "Extrapolating Human Probability Judgment," Theory and Decision, vol. 36, no. 2, pp. 103-129, 1994.
[54] S.A. Sloman, "Feature-Based Induction," Cognitive Psychology, vol. 25, pp. 231-280, 1993.
[55] T. Hansen, M. Olkkonen, S. Walter, and K.R. Gegenfurtner, "Memory Modulates Color Appearance," Nature Neuroscience, vol. 9, pp. 1367-1368, 2006.
[56] D.N. Osherson, J. Stern, O. Wilkie, M. Stob, and E.E. Smith, "Default Probability," Cognitive Science, vol. 15, no. 2, pp. 251-269, 1991.
[57] C. Kemp, J.B. Tenenbaum, T.L. Griffiths, T. Yamada, and N. Ueda, "Learning Systems of Concepts with an Infinite Relational Model," Proc. Nat'l Conf. Artificial Intelligence (AAAI), 2006.
[58] K.E.A. van de Sande, T. Gevers, and C.G.M. Snoek, "Evaluation of Color Descriptors for Object and Scene Recognition," Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2008.
[59] A. Bosch, A. Zisserman, and X. Muñoz, "Representing Shape with a Spatial Pyramid Kernel," Proc. Int'l Conf. Content-Based Image and Video Retrieval (CIVR), 2007.
[60] H. Bay, A. Ess, T. Tuytelaars, and L.J.V. Gool, "Speeded-Up Robust Features (SURF)," Computer Vision and Image Understanding, vol. 110, no. 3, pp. 346-359, 2008.
[61] E. Shechtman and M. Irani, "Matching Local Self-Similarities across Images and Videos," Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2007.
[62] J. Xiao, J. Hays, K.A. Ehinger, A. Oliva, and A. Torralba, "SUN Database: Large-Scale Scene Recognition from Abbey to Zoo," Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), pp. 3485-3492, 2010.
[63] J.C. Platt, "Probabilities for SV Machines," Advances in Large Margin Classifiers. MIT Press, 2000.
[64] L. Torresani, M. Szummer, and A. Fitzgibbon, "Efficient Object Category Recognition Using Classemes," Proc. European Conf. Computer Vision (ECCV), pp. 776-789, Sept. 2010.
[65] K.D. Tang, M.F. Tappen, R. Sukthankar, and C.H. Lampert, "Optimizing One-Shot Recognition with Micro-Set Learning," Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2010.
[66] W.J. Scheirer, A. Rocha, A. Sapkota, and T.E. Boult, "Toward Open Set Recognition," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 35, no. 7, pp. 1757-1772, July 2013.
[67] T. Tommasi, N. Quadrianto, B. Caputo, and C.H. Lampert, "Beyond Data Set Bias: Multi-Task Unaligned Shared Knowledge Transfer," Proc. Asian Conf. Computer Vision (ACCV), 2012.
8 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool