This Article 
 Bibliographic References 
 Add to: 
Enhanced Perceptual Distance Functions and Indexing for Image Replica Recognition
March 2005 (vol. 27 no. 3)
pp. 379-391
The proliferation of digital images and the widespread distribution of digital data that has been made possible by the Internet has increased problems associated with copyright infringement on digital images. Watermarking schemes have been proposed to safeguard copyrighted images, but watermarks are vulnerable to image processing and geometric distortions and may not be very effective. Thus, the content-based detection of pirated images has become an important application. In this paper, we discuss two important aspects of such a replica detection system: distance functions for similarity measurement and scalability. We extend our previous work on perceptual distance functions, which proposed the Dynamic Partial Function (DPF), and present enhanced techniques that overcome the limitations of DPF. These techniques include the Thresholding, Sampling, and Weighting schemes. Experimental evaluations show superior performance compared to DPF and other distance functions. We then address the issue of using these perceptual distance functions to efficiently detect replicas in large image data sets. The problem of indexing is made challenging by the high-dimensionality and the nonmetric nature of the distance functions. We propose using Locality Sensitive Hashing (LSH) to index images while using the above perceptual distance functions and demonstrate good performance through empirical studies on a very large database of diverse images.

[1] H. Garcia-Molina, S.P. Ketchpel, and N. Shivakumar, “Safeguarding and Charging for Information on the Internet,” Proc. 14th Int'l Conf. Data Eng., pp. 182-189, 1998.
[2] Y. Meng and E. Chang, “Image Copy Detection Using DPF,” Proc. IS&T/SPIE Int'l Conf. Storage and Retrieval for Media Databases, pp. 176-186, 2003.
[3] R. Picard, “A Society of Models for Video and Image Libraries,” IBM Systems J., vol. 35, no. 3, pp. 292-312, 1996.
[4] B.S. Manjunath, P. Wu, S. Newsam, and H. Shin, “A Texture Descriptor for Browsing and Similarity Retrieval,” J. Signal Processing: Image Comm., vol. 16, nos. 1-2, pp. 33-43, 2000.
[5] J.Z. Wang, G. Wiederhold, O. Firschein, and S.X. Wei, “Content Based Image Indexing and Searching Using Daubechies' Wavelets,” J. Digital Libraries, vol. 1, no. 4, pp. 311-328, 1998.
[6] A. Jain and G. Healey, “A Multiscale Representation Including Opponent Color Features for Texture Recognition,” IEEE Trans. Image Processing, vol. 7, no. 1, pp. 124-128, 1998.
[7] M.W. Richardson, “Multidimensional Psychophysics,” Psychological Bull., vol. 35, pp. 639-660, 1938.
[8] C.C. Aggarwal, A. Hinneburg, and D.A. Keim, “On the Surprising Behavior of Distance Metrics in High Dimensional Space,” Proc. Int'l Conf. Database Theory, pp. 420-434, 2001.
[9] I. Witten, A. Moffat, and T. Bell, Managing Gigabytes: Compressing and Indexing Documents and Images. New York: Van Nostrand Reinhold, 1994.
[10] Y. Chen and J.Z. Wang, “A Region-Based Fuzzy Feature Matching Approach to Content-Based Image Retrieval,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 9, pp. 1252-1267, Sept. 2002.
[11] B. Li, E. Chang, and Y. Wu, “Discovery of a Perceptual Distance Function for Measuring Image Similarity,” ACM Multimedia J., special issue on content-based image retrieval, vol. 8, no. 6, pp. 512-522, 2003.
[12] Y. Meng, E. Chang, and B. Li, “Enhancing DPF for Near-Replica Image Recognition,” Proc. Int'l Conf. Computer Vision and Pattern Recognition, pp. 416-423, June 2003.
[13] H. Liu and H. Motoda, Feature Selection for Knowledge Discovery and Data Mining. Norwell, Mass: Kluwer Academic Publishers, 1998.
[14] A. Blum and P. Langley, “Selection of Relevant Features and Examples in Machine Learning,” Artificial Intelligence, vol. 97, pp. 245-271, 1997.
[15] T. Dietterich, “Machine Learning Research: Four Current Directions,” Artificial Intelligence Magazine, vol. 18, pp. 97-136, 1998.
[16] R. Kohavi, “Wrappers for Feature Subset Selection,” Artificial Intelligence, vol. 97, nos. 1-2, pp. 273-324, 1997.
[17] E.B. Goldstein, Sensation and Perception, fifth ed. Brooks/Cole, 1999.
[18] Y. Rui, T.S. Huang, and S. Mehrotra, “Content Based Image Retrieval with Relevance Feedback in MARS,” Proc. Int'l Conf. Image Processing, vol. 2, pp. 815-818, 1997.
[19] J. Smith and A. Natsev, “Spatial and Feature Normalization for Content Based Retrieval,” Proc. IEEE Int'l Conf. Multimedia and Expo, vol. 1, pp. 193-196, 2002.
[20] A. Tversky, “Features of Similarity,” Psychological Rev., vol. 84, no. 4, pp. 327-352, 1977.
[21] M.S. Lew, Principles of Visual Information Retrieval. Springer, 2001.
[22] P. Huber, Robust Statistics. Wiley, 1981.
[23] J. Puzicha, J.M. Buhmann, Y. Rubner, and C. Tomasi, “Empirical Evaluation of Dissimilarity Measures for Color and Texture,” Proc. Int'l Conf. Computer Vision, pp. 1165-1172, 1999.
[24] A. Gionis, P. Indyk, and R. Motwani, “Similarity Search in High Dimensions via Hashing,” The Very Large Data Bases J., pp. 518-529, 1999.
[25] K. Goh, B. Li, and E. Chang, “DynDex: A Dynamic and Nonmetric Space Indexer,” ACM Multimedia, pp. 466-475, 2000.
[26] C. Li, E. Chang, H. Garcia-Molina, J. Wang, and G. Wiederhold, “Clindex: Clustering for Similarity Queries in High-Dimensional Spaces,” Technical Report SIDL-WP-1998-0100, Stanford Univ., 1999.
[27] C. Yang, “MACS: Music Audio Characteristic Sequence Indexing for Similarity Retrieval,” Proc. IEEE Workshop Applications of Signal Processing to Audio and Acoustics, 2001.
[28] J. Buhler, “Efficient Large-Scale Sequence Comparison by Locality-Sensitive Hashing,” Bioinformatics, vol. 17, no. 5, pp. 419-428, 2001.
[29] P. Indyk and R. Motwani, “Approximate Nearest Neighbors: towards Removing the Curse of Dimensionality,” Proc. Symp. Theory of Computing, pp. 604-613, 1998.
[30] R.L. Goldstone, “Similarity, Interactive Activation, and Mapping,” J. Experimental Psychology: Learning, Memory, and Cognition, vol. 28, 1994.
[31] D.L. Medin, R.L. Goldstone, and D. Gentner, “Respects for Similarity,” Psychological Rev., vol. 100, no. 2, pp. 254-278, 1993.
[32] S. Tong and E. Chang, “Support Vector Machine Active Learning for Image Retrieval,” Proc. Ninth ACM Int'l Conf. Multimedia, pp. 107-118, 2001.
[33] B. Li, K.-S. Goh, and E. Chang, “Confidence-Based Dynamic Ensemble for Image Annotation and Semantics Discovery,” Proc. ACM Int'l Conf. Multimedia, pp. 195-206, 2003.
[34] B. Li, E. Chang, and C.-S. Li, “Learning Image Query Concepts via Intelligent Sampling,” Proc. IEEE Int'l Conf. Multimedia, pp. 1168-1171, 2001.
[35] Y. Ke, R. Sukthankar, and L. Huston, “An Efficient Parts-Based Near-Duplicate and Sub-Image Retrieval System,” Proc. ACM Int'l Conf. Multimedia, 2004.

Index Terms:
Indexing methods, image databases, image retrieval.
Arun Qamra, Yan Meng, Edward Y. Chang, "Enhanced Perceptual Distance Functions and Indexing for Image Replica Recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 3, pp. 379-391, March 2005, doi:10.1109/TPAMI.2005.54
Usage of this product signifies your acceptance of the Terms of Use.