The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.05 - May (2011 vol.33)
pp: 1022-1036
Yiming Liu , Nanyang Technological University, Singapore
Dong Xu , Nanyang Technological University, Singapore
Ivor Wai-Hung Tsang , Nanyang Technological University, Singapore
Jiebo Luo , Kodak Research Laboratories, Eastman Kodak Company, Rochester
ABSTRACT
The rapid popularization of digital cameras and mobile phone cameras has led to an explosive growth of personal photo collections by consumers. In this paper, we present a real-time textual query-based personal photo retrieval system by leveraging millions of Web images and their associated rich textual descriptions (captions, categories, etc.). After a user provides a textual query (e.g., "water”), our system exploits the inverted file to automatically find the positive Web images that are related to the textual query "water” as well as the negative Web images that are irrelevant to the textual query. Based on these automatically retrieved relevant and irrelevant Web images, we employ three simple but effective classification methods, k-Nearest Neighbor (kNN), decision stumps, and linear SVM, to rank personal photos. To further improve the photo retrieval performance, we propose two relevance feedback methods via cross-domain learning, which effectively utilize both the Web images and personal images. In particular, our proposed cross-domain learning methods can learn robust classifiers with only a very limited amount of labeled personal photos from the user by leveraging the prelearned linear SVM classifiers in real time. We further propose an incremental cross-domain learning method in order to significantly accelerate the relevance feedback process on large consumer photo databases. Extensive experiments on two consumer photo data sets demonstrate the effectiveness and efficiency of our system, which is also inherently not limited by any predefined lexicon.
INDEX TERMS
Textual query-based consumer photo retrieval, large-scale Web data, cross-domain learning.
CITATION
Yiming Liu, Dong Xu, Ivor Wai-Hung Tsang, Jiebo Luo, "Textual Query of Personal Photos Facilitated by Large-Scale Web Data", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.33, no. 5, pp. 1022-1036, May 2011, doi:10.1109/TPAMI.2010.142
REFERENCES
[1] M. Artae, M. Jogan, and A. Leonardis, “Incremental PCA for On-Line Visual Learning and Recognition,” Proc. Int'l Conf. Pattern Recognition, 2002.
[2] J. Blitzer, M. Dredze, and F. Pereira, “Biographies, Bollywood, Boom-Boxes and Blenders: Domain Adaptation for Sentiment Classification,” Proc. Ann. Meeting Assoc. for Computational Linguistics, 2007.
[3] L. Cao, J. Luo, and T.S. Huang, “Annotating Photo Collections by Label Propagation According to Multiple Similarity Cues,” Proc. ACM Conf. Multimedia, 2008.
[4] G. Cauwenberghs and T. Poggio, “Incremental and Decremental Support Vector Machine Learning,” Neural Information Processing Systems, MIT Press, 2000.
[5] C.-C. Chang and C.-J. Lin, “LIBSVM: A Library for Support Vector Machines,” http://www.csie.ntu.edu.tw/cjlinlibsvm, 2001.
[6] S.-F. Chang, D. Ellis, W. Jiang, K. Lee, A. Yanagawa, A.C. Loui, and J. Luo, “Large-Scale Multimodal Semantic Concept Detection for Consumer Video,” Proc. ACM SIGMM Workshop Multimedia Information Retrieval, 2007.
[7] S.-F. Chang, J. He, Y. Jiang, A. Yanagawa, and E. Zavesky, “Columbia University/VIREO-CityU/IRIT TRECVID2008 High-Level Feature Extraction and Interactive Video Search,” Proc. NIST TRECVID Workshop, 2008.
[8] L. Chen, D. Xu, I.W. Tsang, and J. Luo, “Tag-Based Web Photo Retrieval Improved by Batch Mode Re-Tagging,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010.
[9] T.-S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y. Zheng, “NUS-WIDE: A Real-World Web Image Database from National University of Singapore,” Proc. ACM Int'l Conf. Image and Video Retrieval, 2009.
[10] R. Datta, D. Joshi, J. Li, and J.-Z. Wang, “Image Retrieval: Ideas, Influences, and Trends of the New Age,” ACM Computing Surveys, pp. 1-60, 2008.
[11] H. Daum,III, “Frustratingly Easy Domain Adaptation,” Proc. Ann. Meeting Assoc. for Computational Linguistics, 2007.
[12] L. Duan, I.W. Tsang, D. Xu, and S. Maybank, “Domain Transfer SVM for Video Concept Detection,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[13] L. Duan, I.W. Tsang, D. Xu, and T.-S. Chua, “Domain Adaptation from Multiple Sources via Auxiliary Classifiers,” Proc. Int'l Conf. Machine Learning, 2009.
[14] L. Duan, D. Xu, I.W. Tsang, and J. Luo, “Visual Event Recognition in Videos by Learning from Web Data,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010.
[15] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin, “LIBLINEAR: A Library for Large Linear Classification,” J. Machine Learning Research, vol. 9, pp. 1871-1874, 2008.
[16] C. Fellbaum, WordNet: An Electronic Lexical Database. Bradford Books, 1998.
[17] R. Fergus, P. Perona, and A. Zisserman, “A Visual Category Filter for Google Images,” Proc. European Conf. Computer Vision, 2004.
[18] J. He, M. Li, H. Zhang, H. Tong, and C. Zhang, “Manifold-Ranking Based Image Retrieval,” Proc. ACM Conf. Multimedia, 2004.
[19] J. Hays and A. Efros, “Scene Completion Using Millions of Photographs,” ACM Trans. Graphics, vol. 26, pp. 87-94, 2007.
[20] X. He, “Incremental Semi-Supervised Subspace Learning for Image Retrieval,” Proc. ACM Conf. Multimedia, 2004.
[21] R. Herbrich and T. Graepel, “A PAC-Bayesian Margin Bound for Linear Classifiers: Why SVMs Work,” Neural Information Processing Systems, MIT Press, 2001.
[22] S. Hoi, R. Jin, J. Zhu, and M. Lyu, “Semi-Supervised SVM Batch Mode Active Learning for Image Retrieval,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[23] J. Jia, N. Yu, and X.-S. Hua, “Annotating Personal Albums via Web Mining,” Proc. ACM Conf. Multimedia, 2008.
[24] W. Jiang, E. Zavesky, and S.-F. Chang, “Cross-Domain Learning Methods for High-Level Visual Concept Classification,” Proc. IEEE Int'l Conf. Image Processing, 2008.
[25] J. Li and J.Z. Wang, “Real-Time Computerized Annotation of Pictures,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 6, pp. 985-1002, June 2008.
[26] X. Li, L. Chen, L. Zhang, F. Lin, and W. Ma, “Image Annotation by Large-Scale Content-Based Image Retrieval,” Proc. ACM Conf. Multimedia, 2006.
[27] J. Liu, J. Luo, and M. Shah, “Recognizing Realistic Actions from Videos in the Wild,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[28] Y. Liu, D. Xu, I.W. Tsang, and J. Luo, “Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos,” Proc. ACM Conf. Multimedia, 2009.
[29] A. Loui, J. Luo, S.-F. Chang, D. Ellis, W. Jiang, L. Kennedy, K. Lee, and A. Yanagawa, “Kodak's Consumer Video Benchmark Data Set: Concept Definition and Annotation,” Proc. ACM Workshop Multimedia Information Retrieval, 2007.
[30] M. Marszalek, C. Schmid, H. Harzallah, and J. Weijer, “Learning Representations for Visual Object Class Recognition,” Proc. Visual Recognition Challenge Workshop, in Conjunction with ICCV, 2007.
[31] M. Naphade, J. Smith, J. Tesic, S.-F. Chang, W. Hsu, L. Kennedy, A. Hauptmann, and J. Curtis, “Large-Scale Concept Ontology for Multimedia,” IEEE Multimedia Magazine, vol. 13, no. 3, pp. 86-91, July-Sept. 2006.
[32] Y. Rui, T.S. Huang, and S. Mehrotra, “Content-Based Image Retrieval with Relevance Feedback in Mars,” Proc. IEEE Int'l Conf. Image Processing, 1997.
[33] G. Schweikert, C. Widmer, B. Schölkopf, and G. Rätsch, “An Empirical Analysis of Domain Adaptation Algorithms for Genomic Sequence Analysis,” Neural Information Processing Systems, pp. 1433-1440, MIT Press, 2008.
[34] A. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain, “Content-Based Image Retrieval at the End of the Early Years,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 12, pp. 1349-1380, Dec. 2000.
[35] D. Tao, X. Tang, X. Li, and X. Wu, “Asymmetric Bagging and Random Subspace for Support Vector Machines-Based Relevance Feedback in Image Retrieval,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 7, pp. 1088-1099, July 2006.
[36] S. Tong and E. Chang, “Support Vector Machine Active Learning for Image Retrieval,” Proc. ACM Conf. Multimedia, 2001.
[37] A. Torralba, R. Fergus, and W.T. Freeman, “80 Million Tiny Images: A Large Data-Set for Non-Parametric Object and Scene Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 11, pp. 1958-1970, Nov. 2008.
[38] A. Torralba, R. Fergus, and Y. Weiss, “Small Codes and Large Databases for Recognition,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[39] J. Uijlings, A. Smeulders, and R. Scha, “Real-Time Bag of Words, Approximately,” Proc. ACM Int'l Conf. Image and Video Retrieval, 2009.
[40] P. Viola and M. Jones, “Robust Real-Time Face Detection,” Int'l J. Computer Vision, vol. 57, pp. 137-154, 2004.
[41] C. Wang, F. Jing, L. Zhang, and H.-J. Zhang, “Content-Based Image Annotation Refinement,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007.
[42] C. Wang, L. Zhang, and H. Zhang, “Learning to Reduce the Semantic Gap in Web Image Retrieval and Annotation,” Proc. ACM SIGIR, 2008.
[43] G. Wang, D. Hoiem, and D. Forsyth, “Learning Image Similarity from Flickr Groups Using Stochastic Intersection Kernel Machines,” Proc. IEEE Int'l Conf. Computer Vision, 2009.
[44] X.-J. Wang, L. Zhang, F. Jing, and W.-Y. Ma, “AnnoSearch: Image Auto-Annotation by Search,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006.
[45] X.-J. Wang, L. Zhang, X. Li, and W.-Y. Ma, “Annotating Images by Mining Image Search Results,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 11, pp. 1919-1932, Nov. 2008.
[46] Y. Weiss, A. Torralba, and R. Fergus, “Spectral Hashing,” Neural Information Processing Systems, MIT Press, 2008.
[47] I.H. Witten, A. Moffat, and T. Bell, Managing Gigabytes: Compressing and Indexing Documents and Images. Kaufmann Publishers, 1999.
[48] P. Wu and T.G. Dietterich, “Improving SVM Accuracy by Training on Auxiliary Data Sources,” Proc. Int'l Conf. Machine Learning, 2004.
[49] J. Yang, R. Yan, and A.G. Hauptmann, “Cross-Domain Video Concept Detection Using Adaptive SVMs,” Proc. ACM Conf. Multimedia, 2007.
[50] L. Zhang, F. Lin, and B. Zhang, “Support Vector Machine Learning for Image Retrieval,” Proc. IEEE Int'l Conf. Image Processing, 2001.
[51] X. Zhou and T. Huang, “Small Sample Learning during Multimedia Retrieval Using Bias Map,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2001.
[52] X. Zhu, “Semi-Supervised Learning Literature Survey,” Technical Report 1530, Computer Sciences Dept., Univ. of Wisconsin-Madison, 2008.
22 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool