This Article 
 Bibliographic References 
 Add to: 
Automatic Website Summarization by Image Content: A Case Study with Logo and Trademark Images
September 2008 (vol. 20 no. 9)
pp. 1195-1204
Image-based abstraction (or summarization) of a Web site is the process of extracting the most characteristic (or important) images from it. The criteria for measuring the importance of images in Web sites are based on their frequency of occurrence, characteristics of their content and Web link information. As a case study, this work focuses on logo and trademark images. These are important characteristic signs of corporate Web sites or of products presented there. The proposed method incorporates machine learning for distinguishing logo and trademarks from images of other categories (e.g., landscapes, faces). Because the same logo or trademark may appear many times in various forms within the same Web site, duplicates are detected and only unique logo and trademark images are extracted. These images are then ranked by importance taking frequency of occurrence, image content and Web link information into account. The most important logos and trademarks are finally selected to form the image-based summary of a Web site. Evaluation results of the method on real Web sites are also presented. The method has been implemented and integrated into a fully automated image-based summarization system which is accessible on the Web (

[1] O. Buyukkokten, H. Garcia-Molina, and A. Paepcke, “Seeing the Whole in Parts: Text Summarization for Web Browsing on Handheld Devices,” Proc. 10th Int'l World Wide Web Conf. (WWW), 2001.
[2] Y. Zhang, N. Zincir-Heywood, and E. Milios, “World Wide Web Site Summarization,” Web Intelligence and Agent Systems: An Int'l J. (The Web Intelligence Consortium), vol. 2, no. 1, pp. 39-53, 2004.
[3] I. Mani, “Recent Developments in Text Summarization,” Proc. Int'l Conf. Information and Knowledge Management (CIKM '01), pp. 529-531, 2001.
[4] J. Hu and A. Bagga, “Identifying Story and Preview Images in News Web Pages,” Proc. Seventh Int'l Conf. Document Analysis and Recognition (ICDAR '03), pp. 640-644, Aug. 2003.
[5] I. Witten and E. Frank, Data Mining, chapter 4. Morgan Kaufmann, 2000.
[6] A.K. Jain and A. Vailaya, “Shape-Based Retrieval: A Case Study with Trademark Image Databases,” Pattern Recognition, vol. 31, no. 9, pp. 1369-1399, 1998.
[7] B.M. Mehtre, M.S. Kankanhalli, and W.F. Lee, “Content-Based Image Retrieval Using a Composite Color-Shape Approach,” Information Processing and Management, vol. 34, no. 1, pp. 109-120, 1998.
[8] K. Nakahira, T. Yamasaki, and K. Aizawa, “Accuracy Enhancement of Function-Oriented Web Image Classification,” Proc. Int'l World Wide Web Conf. (WWW '05), pp. 950-951, May 2005.
[9] G. Heidemann, “Unsupervised Image Categorization,” Image and Vision Computing, vol. 23, no. 10, pp. 861-876, Sept. 2005.
[10] X. Lu, P. Mitra, J. Wand, and C. Giles, “Automatic Categorization of Figures in Scientific Documents,” Proc. Joint Conf. Digital Libraries (JCDL '06), pp. 129-138, June 2006.
[11] M. Sonka, V. Hlavac, and R. Boyle, Image Processing Analysis and Machine Vision, chapter 6.3. PWS Publishing, 1999.
[12] M. Seul, L. O'Gorman, and M. Sammon, Practical Algorithms for Image Analysis, chapter 3.9. Cambridge Univ. Press, 2000.
[13] W. Vetterling, W. Press, B. Flannery, and S. Teukolsky, Numerical Recipes in C++, the Art of Scientific Computing, chapter 14. Cambridge Univ. Press, 2003.
[14] M. Sonka, V. Hlavac, and R. Boyle, Image Processing Analysis and Machine Vision, chapter 14. PWS Publishing, 1999.
[15] E. Voutsakis, E. Petrakis, and E. Milios, “Weighted Link Analysis for Logo and Trademark Image Retrieval on the Web,” Proc. IEEE/WIC/ACM Int'l Conf. Web Intelligence (WI '05), Compiegne Univ. of Tech nology, 2005.
[16] S.S. Skiena, The Algorithm Design Manual, chapter 8. Springer-Verlag, 1998.
[17] D. Cai, X. He, Z. Li, W.-Y. Ma, and J.-R. Wen, “Hierarchical Clustering of WWW Image Search Results Using Visual, Textual and Link Information,” Proc. ACM Int'l Conf. Multimedia (ACM Multimedia), 2004.
[18] J. Zhou and D. Lopresti, “Extracting Text from WWW Images,” Proc. Int'l Conf. Document Analysis and Recognition (ICDAR '97), pp.248-252, 1997.
[19] D. Zhanng and S. Chang, “Detecting Image Near-Duplicate by Stochastic Attributed Relational Graph Matching with Learning,” Proc. 12th ACM Int'l Conf. Multimedia (ACM Multimedia '04), pp.877-884, 2004.

Index Terms:
Information Storage and Retrieval, Content Analysis and Indexing, Abstracting methods, Indexing Methods, Applications
Evdoxios Baratis, Euripides G.M. Petrakis, Evangelos E. Milios, "Automatic Website Summarization by Image Content: A Case Study with Logo and Trademark Images," IEEE Transactions on Knowledge and Data Engineering, vol. 20, no. 9, pp. 1195-1204, Sept. 2008, doi:10.1109/TKDE.2008.34
Usage of this product signifies your acceptance of the Terms of Use.