Issue No. 09 - September (1995 vol. 28)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/2.410153
This research explores the interaction of textual and photographic information in an integrated text/image database environment developed at the Center of Excellence for Document Analysis and Recognition (CEDAR), State University of New York, Buffalo. The idea is to extract information from a newspaper photo caption that can be used for retrieving the picture and for identifying the people shown. A multistage system, called Piction, uses spatial and characteristic constraints derived from the caption in labeling face candidates generated by a face locator. Several other vision systems employ the idea of top-down control in picture understanding by providing the general context; this system carries the notion one step further, exploiting not only general context but also picture-specific context. The author gives several examples showing how information from both text and images can be used in computing the similarity between a given query and an image in the database to satisfy focus-of-attention queries. Although Piction represents only a preliminary foray into truly integrated text/image content-based retrieval, it shows that additional discriminatory capabilities can be obtained by combining the two sources of information. Much work remains, however, both in improving the language processing capabilities and in face location and characterization.
Rohini K. Srihari, "Automatic Indexing and Content-Based Retrieval of Captioned Images", Computer, vol. 28, no. , pp. 49-56, September 1995, doi:10.1109/2.410153