The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.06 - June (2008 vol.30)
pp: 985-1002
ABSTRACT
Developing effective methods for automated annotation of digital pictures continues to challenge computer scientists. The capability of annotating pictures by computers can lead to breakthroughs in a wide range of applications, including Web image search, online picture-sharing communities, and scientific experiments. In this work, the authors developed new optimization and estimation techniques to address two fundamental problems in machine learning. These new techniques serve as the basis for the Automatic Linguistic Indexing of Pictures - Real Time (ALIPR) system of fully automatic and high speed annotation for online pictures. In particular, the D2-clustering method, in the same spirit as k-means for vectors, is developed to group objects represented by bags of weighted vectors. Moreover, a generalized mixture modeling technique (kernel smoothing as a special case) for non-vector data is developed using the novel concept of Hypothetical Local Mapping (HLM). ALIPR has been tested by thousands of pictures from an Internet photo-sharing site, unrelated to the source of those pictures used in the training process. Its performance has also been studied at an online demo site where arbitrary users provide pictures of their choices and indicate the correctness of each annotation word. The experimental results show that a single computer processor can suggest annotation terms in real-time and with good accuracy.
INDEX TERMS
Statistical computing, Multimedia databases, Indexing methods, Algorithms, Image/video retrieval
CITATION
Jia Li, James Z. Wang, "Real-Time Computerized Annotation of Pictures", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.30, no. 6, pp. 985-1002, June 2008, doi:10.1109/TPAMI.2007.70847
REFERENCES
[1] A. Bagdanov, L. Ballan, M. Bertini, and A. Del Bimbo, “Trademark Matching and Retrieval in Sports Video Databases,” Proc. Int'l Workshop Multimedia Information Retrieval, pp. 79-86, Sept. 2007.
[2] K. Barnard, P. Duygulu, N. de Freitas, D.A. Forsyth, D.M. Blei, and M.I. Jordan, “Matching Words and Pictures,” J. Machine Learning Research, vol. 3, pp. 1107-1135, 2003.
[3] D. Beymer and T. Poggio, “Image Representations for Visual Learning,” Science, vol. 272, pp. 1905-1909, 1996.
[4] P.J. Bickel and D.A. Freedman, “Some Asymptotic Theory for the Bootstrap,” Annals of Statistics, vol. 9, pp. 1196-1217, 1981.
[5] G. Carneiro, A.B. Chan, P.J. Moreno, and N. Vasconcelos, “Supervised Learning of Semantic Classes for Image Annotation and Retrieval,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 3, pp. 394-410, Mar. 2007.
[6] S.-F. Chang, W. Chen, and H. Sundaram, “Semantic Visual Templates: Linking Visual Features to Semantics,” Proc. Int'l Conf. Image Processing, vol. 3, pp. 531-535, 1998.
[7] Y. Chen and J.Z. Wang, “Image Categorization by Learning and Reasoning with Regions,” J. Machine Learning Research, vol. 5, pp.913-939, Aug. 2004.
[8] R. Datta, D. Joshi, J. Li, and J.Z. Wang, “Image Retrieval: Ideas, Influences, and Trends of the New Age,” to be published in, ACM Computing Surveys, 2008.
[9] I. Daubechies, Ten Lectures on Wavelets. Capital City Press, 1992.
[10] M. Evans, N. Hastings, and B. Peacock, Statistical Distributions, third ed. John Wiley & Sons, 2000.
[11] R.C. Gonzalez and R.E. Woods, Digital Image Processing, second ed. Prentice Hall, 2002.
[12] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inferences, and Prediction. Springer, 2001.
[13] J. He, M. Li, H.-J. Zhang, H. Tong, and C. Zhang, “Mean Version Space: A New Active Learning Method for Content-Based Image Retrieval,” Proc. Multimedia Information Retrieval Workshop, pp. 15-22, 2004.
[14] X. He, W.-Y. Ma, and H.-J. Zhang, “Learning an Image Manifold for Retrieval,” Proc. ACM Multimedia Conf., pp. 17-23, 2004.
[15] E. Levina and P. Bickel, “The Earth Mover's Distance Is the Mallows Distance: Some Insights from Statistics,” Proc. Int'l Conf. Computer Vision, pp. 251-256, 2001.
[16] J. Li and J.Z. Wang, “Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 25, no. 9, pp. 1075-1088, Sept. 2003.
[17] D.G. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints,” Int'l J. Computer Vision, vol. 60, no. 2, pp. 91-110, 2004.
[18] G.J. McLachlan and D. Peel, Finite Mixture Models. John Wiley, 2000.
[19] C.L. Mallows, “A Not on Asymptotic Joint Normality,” Annals of Math. Statistics, vol. 43, no. 2, pp. 508-515, 1972.
[20] F. Monay and D. Gatica-Perez, “On Image Auto-Annotation with Latent Space Models,” Proc. ACM Multimedia Conf., pp. 275-278, 2003.
[21] T. Quack, U. Monich, L. Thiele, and B.S. Manjunath, “Cortina: A System for Large-Scale, Content-Based Web Image Retrieval,” Proc. ACM Multimedia Conf., pp. 508-511, 2004.
[22] S.T. Rachev, “The Monge-Kantorovich Mass Transference Problem and Its Stochastic Applications,” Theory of Probability and Its Applications, vol. 29, pp. 647-676, 1984.
[23] Y. Rubner, C. Tomasi, and L.J. Guibas, “A Metric for Distribution with Applications to Image Databases,” Proc. Int'l Conf. Computer Vision, pp. 59-66, 1998.
[24] Y. Rui, T.S. Huang, M. Ortega, and S. Mehrotra, “Relevance Feedback: A Power Tool in Interactive Content-Based Image Retrieval,” IEEE Trans. Circuits and Systems for Video Technology, vol. 8, no. 5, pp. 644-655, 1998.
[25] A. Singhal, J. Luo, and W. Zhu, “Probabilistic Spatial Context Models for Scene Content Understanding,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 235-241, June 2003.
[26] A.W.M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain, “Content-Based Image Retrieval at the End of the Early Years,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 12, pp. 1349-1380, Dec. 2000.
[27] J.R. Smith and S.-F. Chang, “VisualSEEk: A Fully Automated Content-Based Image Query System,” Proc. ACM Multimedia Conf., pp. 87-98, 1996.
[28] K. Tieu and P. Viola, “Boosting Image Retrieval,” Int'l J. Computer Vision, vol. 56, nos. 1/2, pp. 17-36, 2004.
[29] C. Tomasi, “Past Performance and Future Results,” Nature, vol. 428, p. 378, Mar. 2004.
[30] S. Tong and E. Chang, “Support Vector Machine Active Learning for Image Retrieval,” Proc. ACM Multimedia Conf., pp. 107-118, 2001.
[31] N. Vasconcelos and A. Lippman, “A Multiresolution Manifold Distance for Invariant Image Similarity,” IEEE Trans. Multimedia, vol. 7, no. 1, pp. 127-142, 2005.
[32] J.Z. Wang, J. Li, and G. Wiederhold, “SIMPLIcity: Semantics-Sensitive Integrated Matching for Picture Libraries,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 9, pp. 947-963, Sept. 2001.
[33] J.Z. Wang and J. Li, “Learning-Based Linguistic Indexing of Pictures with 2-D MHMMs,” Proc. ACM Multimedia Conf., pp. 436-445, 2002.
[34] C. Zhang and T. Chen, “An Active Learning Framework for Content-Based Information Retrieval,” IEEE Trans. Multimedia, vol. 4, no. 2, pp. 260-268, 2002.
15 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool