The Community for Technology Leaders
RSS Icon
Issue No.06 - November/December (2009 vol.15)
pp: 1145-1152
Hendrik Strobelt , University of Konstanz
Daniela Oelke , University of Konstanz
Christian Rohrdantz , University of Konstanz
Andreas Stoffel , University of Konstanz
Daniel A. Keim , University of Konstanz
Oliver Deussen , University of Konstanz
Finding suitable, less space consuming views for a document’s main content is crucial to provide convenient access to large document collections on display devices of different size. We present a novel compact visualization which represents the document’s key semantic as a mixture of images and important key terms, similar to cards in a top trumps game. The key terms are extracted using an advanced text mining approach based on a fully automatic document structure extraction. The images and their captions are extracted using a graphical heuristic and the captions are used for a semi-semantic image weighting. Furthermore, we use the image color histogram for classification and show at least one representative from each non-empty image class. The approach is demonstrated for the IEEE InfoVis publications of a complete year. The method can easily be applied to other publication collections and sets of documents which contain images.
document visualization, visual summary, content extraction, document collection browsing
Hendrik Strobelt, Daniela Oelke, Christian Rohrdantz, Andreas Stoffel, Daniel A. Keim, Oliver Deussen, "Document Cards: A Top Trumps Visualization for Documents", IEEE Transactions on Visualization & Computer Graphics, vol.15, no. 6, pp. 1145-1152, November/December 2009, doi:10.1109/TVCG.2009.139
[1] J. Baldridge, The opennlp project., 2009.
[2] D. Bauer, P. Fastrez, and J. Hollan, Spatial Tools for Managing Personal Information Collections. Proc. of Hawaii Int. Conf. on System Sciences, 4: 104b, 2005.
[3] K. Berkner, How small should a document thumbnail be? Proc. of SPIE, 6076: 127–138, 2006.
[4] K. Berkner, E. Schwartz, and C. Marle, Smartnails − Display- and Image Dependent Thumbnails. Proc. of SPIE, 5296: 54–65, 2003.
[5] T. Breuel, W. Janssen, K. Popat, and H. Baird, Paper to PDA. Proc. 16th ICPR, 4: 476–479, 2002.
[6] C.-C. Chang and C.-J. Lin, LIBSVM − A Library for Support Vector Machines., 2009.
[7] H. Chao and J. Fan, Layout and Content Extraction for PDF Documents. Document Analysis Systems VI, pages 213–224, 2004.
[8] O. Chapelle, P. Haffner, and V. Vapnik, Support Vector Machines for Histogram-Based Image Classification. IEEE Trans. on Neural Networks, 10 (5): 1055–1064, 1999.
[9] A. Cockburn, C. Gutwin, and J. Alexander, Faster Document Navigation with Space-Filling Thumbnails. Proc. of CHI, pages 1–10, 2006.
[10] B. Erol, K. Berkner, and S. Joshi, Multimedia Thumbnails for Documents. Proc. of the 14th ACM Intern. Conf. on Multimedia, pages 231– 240, 2006.
[11] J. Feinberg, Wordle - Beautiful Word Clouds. http://wordle. net, 2009.
[12] R. Feldman, M. Fresko, Y. Kinar, Y. Lindell, O. Liphstat, M. Rajman, Y. Schler, and O. Zamir, Text Mining at the Term Level. In Proc. of the 2nd European Symposium on Principles of Data Mining and Knowledge Discovery (PKDD'98), pages 65–73, 1998.
[13] Ghostscript., 2009.
[14] T. Henry and S. Hudson, Multidimensional Icons. ACM Trans. on Graphics (TOG), 9 (1): 133–137, 1990.
[15] T. Itoh, Y. Yamaguchi, Y. Ikehata, and Y. Kajinaga, Hierarchical Data Visualization Using a Fast Rectangle-Packing Algorithm. IEEE Trans. on Visualization and Computer Graphics, 10 (3): 302–313, 2004.
[16] K. Kageura and B. Umino, Methods of Automatic Term Recognition: A Review. Terminology, 3 (2): 259ff, 1996.
[17] R. Korf, Optimal Rectangle Packing: Initial Results. Proc. of the 13th Intern. Conf. on Automated Planning and Scheduling (ICAPS03), pages 287–295, 2003.
[18] Z. Kou, W. Cohen, and R. Murphy, Extracting Information from Text and Images for Location Proteomics. Proc. 3rd ACM SIGKDD Workshop on Data Mining in Bioinformatics (BIOKDD03), pages 2–9, 2003.
[19] M. Kruk, PDF to HTML conversion tool. http://sourceforge. net/projects/pdftohtml , 2009.
[20] R. Kuhlen, Experimentelle Morphologie in der Informationswissenschaft. Verlag Dokumentation, 1977.
[21] H. Lam and P. Baudisch, Summary Thumbnails: Readable Overviews for Small Screen Web Browsers. Proc. of CHI, pages 681–690, 2005.
[22] J. Lewis, R. Rosenholtz, N. Fong, and U. Neumann, VisualIDs: Automatic Distinctive Icons for Desktop Interfaces. ACM Trans. on Graphics (TOG), pages 416–423, 2004.
[23] G. Maderlechner, J. Panyr, and P. Suda, Finding Captions in PDF-Documents for Semantic Annotations of Images. LNCS: Structural, Syntactic, and Statistical Pattern Recognition, 4109: 422–430, Jan 2006.
[24] Y. Matsuo and M. Ishizuka, Keyword Extraction From A Single Document Using Word Co-Occurrence Statistical Information. Intern. Journal on Artificial Intelligence Tools, 13 (1): 157–169, 2004.
[25] P. J. Moreno, P. P. Ho, and N. Vasconcelos, A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications. In Advances in Neural Information Processing Systems 16, 2004.
[26] H. Murata, K. Fujiyoshi, S. Nakatake, and Y. Kajitani, Rectangle-Packing-Based Module Placement. Proc. of Intern. Conf. on Computer-Aided Design (ICCAD), pages 472–479, 1995.
[27] D. Russell, A. Dieberger, I. Center, and C. S. Jose, Synthesizing Evocative Imagery Through Design Patterns. Proc. of Hawaii Int. Conf. on System Sciences, page 4pp, 2003.
[28] G. Salton, A. Wong, and C. S. Yang, A Vector Space Model for Automatic Indexing. Commun. ACM, 18 (11): 613–620, 1975.
[29] M. J. Schuemie, M. Weeber, B. J. A. Schijvenaars, E. M. Van Mulligen, C. C. Van Der Eijk, R. Jelier, B. Mons, and J. A. Kors, Distribution of Information in Biomedical Abstracts and Full-Text Publications. Bioinformatics, 20 (16): 2597–2604, 2004.
[30] J. Scott, Packing Lightmaps. texts/lightmaps , 2009.
[31] C. Seifert, B. Kump, W. Kienreich, G. Granitzer, and M. Granitzer, On the Beauty and Usability of Tag Clouds. Proc. of the 12th Intern. Conf. on Information Visualization (IV), pages 17 − 25, Jun 2008.
[32] V. Setlur, C. Albrecht-Buehler, and A. A. Gooch, Semanticons: Visual Metaphors as File Icons. Computer Graphics Forum (Eurographics), 24 (3): 647–656, 2005.
[33] P. K. Shah, C. Perez-Iratxeta, P. Bork, and M. A. Andrade, Information Extraction from Full Text Scientific Articles: Where Are the Keywords? BMC Bioinformatics, 4 (1), 2003.
[34] K. Spaerck-Jones, A Statistical Interpretation of Term Specificity and Its Application in Retrieval. Journal of Documentation, 28: 11–21, 1972.
[35] B. Suh, H. Ling, B. Bederson, and D. Jacobs, Automatic Thumbnail Cropping and Its Effectiveness. Proc. of the 16th ACM Symp. on User Interface Software and Technology (UIST), pages 95–104, 2003.
[36] B. Suh, A. Woodruff, R. Rosenholtz, and A. Glass, Popout Prism: Adding Perceptual Principles to Overview+Detail Document Interfaces. Proc. of CHI, pages 251–258, 2002.
[37] N. Vasconcelos, On the Efficient Evaluation of Probabilistic Similarity Functions for Image Retrieval. IEEE Trans. on Information Theory, 50 (7): 1482–1496, 2004.
18 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool