This Article 
 Bibliographic References 
 Add to: 
Association and Content-Based Retrieval
January/February 2003 (vol. 15 no. 1)
pp. 118-135
Chabane Djeraba, IEEE Computer Society

Abstract—In spite of important efforts in content-based indexing and retrieval during these last years, seeking relevant and accurate images remains a very difficult query. In the state-of-the-art approaches, the retrieval task may be efficient for some queries in which the semantic content of the query can be easily translated into visual features. For example, finding images of fires is simple because fires are characterized by specific colors (yellow and red). However, it is not efficient in other application fields in which the semantic content of the query is not easily translated into visual features. For example, finding images of birds during migrations is not easy because the system has to understand the query semantic. In the query, the basic visual features may be useful (a bird is characterized by a texture and a color), but they are not sufficient. What is missing is the generalization capability. Birds during migrations belong to the same repository of birds, so they share common associations among basic features (e.g., textures and colors) that the user cannot specify explicitly. In this paper, we present an approach that discovers hidden associations among features during image indexing. These associations discriminate image repositories. The best associations are selected on the basis of measures of confidence. To reduce the combinatory explosion of associations, because images of the database contain very large numbers of colors and textures, we consider a visual dictionary that group together similar colors and textures. Thus, the visual dictionary summarizes the image features. An algorithm based on a clustering strategy creates the visual dictionary. The associations discovered permit the automatic classification of images during their insertion into image repositories and return accurate and relevant results. More generally, we show that content and knowledge-based indexing and retrieval is more efficient than retrieval approaches based on content exclusively and inaugurate a new generation of approaches in which knowledge contributes to finding images in large image repositories.

[1] S.-F. Chang, J.R. Smith, M. Beigi, and A. Benitez, “Visual Information Retrieval from Large Distributed Online Repositories,” Comm. ACM, vol. 40, no. 12, pp. 63-71, 1997.
[2] W. Chang, G. Sheikholeslami, J. Wang, and A. Zhang, “Data Resource Selection in Distributed Visual Information Systems,” IEEE Trans. Knowledge and Data Eng., vol. 10, no. 6, pp. 926-946, Nov./Dec. 1998.
[3] T. Chua, K. Teo, B. Ooi, and K. Tan, “Using Domain Knowledge in Querying Image Databases,” Multimedia Modeling, Towards the information Superhighway, 1996.
[4] P. Danzig, S. Li, and K. Obraczk, “Distributed Indexing for Autonomous Internet Services,” technical report, Dept. of Computer Science, Univ. of South California, June 1992.
[5] S. Deerwester, S. Dumais, G. Furnas, T. Landauer, and R. Harshman, “Indexing by Latent Semantic Analysis,” J. Am. Soc. Information Science, vol. 41, pp. 391-407, 1990.
[6] C. Djeraba and M. Bouet, “Digital Information Retrieval,” Proc. ACM CIKM '97, Nov. 1997.
[7] C. Djeraba, M. Bouet, and H. Briand, “Concept-Based Query in Visual Information Systems,” Proc. IEEE Advances in Digital Libraries, pp. 299-308, 1998.
[8] C. Faloutsos, M. Flickner, W. Niblack, D. Petrovic, W. Equitz, and R. Barber, “Efficient and Effective Query by Image Content,” research report, IBM Alameda Research Center, 1993.
[9] R. Gras, “THE EISCAT CORRELATOR, EISCAT technical note,” Kiiruna 1982, EISCAT Report 82/34, 1982.
[10] R.M. Gray, "Vector Quantization," IEEE Acoustics, Speech and Signal Processing, pp. 4-29, Apr. 1984.
[11] A. Gupta and R. Jain, “Visual Information Retrieval,” Comm. ACM, vol. 40, no. 5, pp. 70-79, May 1997.
[12] J. Huang, R. Kumar, and R. Zabih, “An Automatic Hierarchical Image Classification Scheme,” Proc. Sixth ACM Int'l Conf. Multimedia (ACM MM-98), 1998.
[13] R. Jain, “Content-Based Multimedia Information Management,” Proc. Int'l Conf. Data Eng. (ICDE), pp. 252-253, 1998.
[14] B. Kahle and A. Medlar, “An Information System for Corporate Users: Wide Area Information Servers,” ConneXions—The Interoperability Report, vol. 5, no. 11, Nov. 1991.
[15] L. Kaufman and P.J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley&Sons, 1990.
[16] M. Koster, “ALIWEB: Archie-Like Indexing in the Web,” Computer Networks and ISDN Systems, vol. 27, no. 2, pp. 175-182, 1994.
[17] Y. Linde, A. Buzo, R.M. Gray, An Algorithm for Vector Quantizer Design IEEE Trans. Comm., vol. 28, no. 1, pp. 84-95, 1980.
[18] A. Pentland, R.W. Picard, and S. Sclaroff, “Photobook: Tools for Content-Based Manipulation of Image Databases,” Proc. SPIE-94, pp. 34-47, 1994.
[19] V. Raghavan, G. Jung, and P. Bollman, “A Critical Investigation of Recall and Precision as Measures,” ACM Trans. Information Systems, vol. 7, no. 3, pp. 205-229, July 1989.
[20] C.J. van Rijsbergen, Information Retrieval. London: Butterworths, 1979.
[21] S. Gerard, Automatic Information Organization and Retrieval, chapter 4. McGraw Hill Book Co, New York, 1968.
[22] C.T. Zahn and R.Z. Roskies, “Fourier Descriptors for Plane Closed Curves,” IEEE Trans. Computers, 1972.

Index Terms:
Image, indexing, retrieval, similarity, association.
Chabane Djeraba, "Association and Content-Based Retrieval," IEEE Transactions on Knowledge and Data Engineering, vol. 15, no. 1, pp. 118-135, Jan.-Feb. 2003, doi:10.1109/TKDE.2003.1161586
Usage of this product signifies your acceptance of the Terms of Use.