January–March 2013 (Vol. 20, No. 1) pp. 92, 91
1070-986X/13/$31.00 © 2013 IEEE

Published by the IEEE Computer Society
Just the Facets
John R. Smith , IBM Research
  Article Contents  
  References  
Download Citation
   
Download Content
 
PDFs Require Adobe Acrobat
 
With the explosion of multimedia data, it is more important than ever to create capabilities for cataloging images based on visual content. Although unprecedented access to large datasets has greatly accelerated research related to multimedia semantic modeling, we remain far from creating computerized capabilities for recognizing content in everyday images. One reason is that too little has been learned from good old library science. Libraries have long studied classification methods and schemes. By building on the foundations of library science, researchers can develop a principled approach for creating suitable visual classification schemes to help bridge the multimedia semantic gap. 1
One of the most popular image data resources used by the multimedia research community, ImageNet, can offer a simple illustrated example. 2 ImageNet, a large image repository organized according to the WordNet hierarchy, aims to collect at least 1,000 images for each of tens of thousands of WordNet concepts. Much research is being done on automatically classifying images using ImageNet. However, from a visual knowledge point of view, ImageNet is not a good classification scheme. The ImageNet hierarchy follows a generic is-a relationship between superclasses and subclasses. Thus, mutual exclusivity is not modeled appropriately.
For example, consider a photograph that shows a person who is a good guy, has a beard, is a philosopher, and is a native of Africa. According to ImageNet these categories are mutually exclusive, as Figure 1 shows. As a result, there will be tremendous confusion in training classifiers using one-versus-all methods because, for example, using ImageNet a person should either be a good guy or have a beard. I'll make sure to shave more closely tomorrow.




Figure 1. Excerpt from ImageNet hierarchy showing inappropriate modeling of mutual exclusivity.



Another way to address this situation is with facets. Libraries have long practiced the idea that there are multiple ways to view the world using facets, 3 which let us represent multiple perspectives simultaneously. Each facet follows its own hierarchical relationship system, and there is no mutual exclusivity across facets. As a result, categories won't be pitted against each other inappropriately.
Facets have also received some attention in image search. 4 For example, the Flamenco system allows a faceted search of art images that includes dimensions such as location, objects, and shapes and colors (see http://orange.sims.berkeley.edu/cgi-bin/flamenco.cgi/famuseum/Flamenco). Clearly, the visual world requires multiple dimensions to be modeled simultaneously, and a faceted classification scheme can represent these multiple, visual perspectives.
Figure 2 illustrates how facets can be applied to the ImageNet excerpt such that independent hierarchies can be developed for each dimension of a person. In this example, facets could correspond to temperament, gender, facial hair, occupation, and nationality. This would provide a better representation for the visual concepts and their relationships, which would improve classifier training as well.




Figure 2. Modified example of the ImageNet excerpt using faceted classification.



Resources such as ImageNet are extremely helpful, but we have a long way to go to bridge the semantic gap. Facets are an important construct from library science that have been inexplicably absent in work on multimedia semantic modeling. What is perhaps most notable is that the multimedia community is sorely missing and in great need of an effective modeling methodology and shared semantic representation for visual information. WordNet has been invaluable as a definitive lexical resource for text and language research. However, the world of images differs significantly from the world of words. Until such a resource is developed, we will have to make do with what we have. But one thing that is for sure, when it is created, it will have facets.

References

John R. Smith is a senior manager of Intelligent Information Management at IBM T.J. Watson Research Center. Contact him at jsmith@us.ibm.com.