The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.10 - Oct. (2013 vol.19)
pp: 1646-1663
C. Gorg , Comput. Biosci. Program, Univ. of Colorado, Aurora, CO, USA
Zhicheng Liu , Dept. of Comput. Sci., Stanford Univ., Stanford, CA, USA
Jaeyeon Kihm , Cornell CIS, Ithaca, NY, USA
Jaegul Choo , Coll. of Comput., Georgia Inst. of Technol., Atlanta, GA, USA
Haesun Park , Coll. of Comput., Georgia Inst. of Technol., Atlanta, GA, USA
J. Stasko , Sch. of Interactive Comput., Georgia Inst. of Technol., Atlanta, GA, USA
ABSTRACT
Investigators across many disciplines and organizations must sift through large collections of text documents to understand and piece together information. Whether they are fighting crime, curing diseases, deciding what car to buy, or researching a new field, inevitably investigators will encounter text documents. Taking a visual analytics approach, we integrate multiple text analysis algorithms with a suite of interactive visualizations to provide a flexible and powerful environment that allows analysts to explore collections of documents while sensemaking. Our particular focus is on the process of integrating automated analyses with interactive visualizations in a smooth and fluid manner. We illustrate this integration through two example scenarios: An academic researcher examining InfoVis and VAST conference papers and a consumer exploring car reviews while pondering a purchase decision. Finally, we provide lessons learned toward the design and implementation of visual analytics systems for document exploration and understanding.
INDEX TERMS
Text analysis, Visualization, Measurement, Data visualization, Algorithm design and analysis, Computational modeling, Tag clouds,document analysis, Visual analytics, information visualization, sensemaking, exploratory search, information seeking
CITATION
C. Gorg, Zhicheng Liu, Jaeyeon Kihm, Jaegul Choo, Haesun Park, J. Stasko, "Combining Computational Analyses and Interactive Visualization for Document Exploration and Sensemaking in Jigsaw", IEEE Transactions on Visualization & Computer Graphics, vol.19, no. 10, pp. 1646-1663, Oct. 2013, doi:10.1109/TVCG.2012.324
REFERENCES
[1] R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval: The Concepts and Technology behind Search, second ed. ACM Press, 2011.
[2] C. Bartneck and J. Hu, "Scientometric Analysis of the CHI Proceeding," Proc. ACM Conf. Human Factors in Computing Systems (CHI), pp. 699-708, 2009.
[3] M. Berry and M. Castellanos, Survey of Text Mining II: Clustering, Classification, and Retrieval, vol. XVI. Springer, 2008.
[4] E.A. Bier, S.K. Card, and J.W. Bodnar, "Principles and Tools for Collaborative Entity-Based Intelligence Analysis," IEEE Trans. Visualization and Computer Graphics, vol. 16, no. 2, pp. 178-191, Mar./Apr. 2010.
[5] D. Blei and J. Lafferty, "Visualizing Topics with Multi-Word Expressions," arXiv:0907.1013v1, technical report, 2009.
[6] D.M. Blei, A.Y. Ng, and M.I. Jordan, "Latent Dirichlet Allocation," J. Machine Learning Research, vol. 3, pp. 993-1022, 2003.
[7] E. Braunstein, C. Görg, Z. Liu, and J. Stasko, "Jigsaw to Save Vastopolis - VAST 2011 Mini Challenge 3 Award: 'Good Use of the Analytic Process'," Proc. IEEE Conf. Visual Analytics Science and Technology (VAST), pp. 323-324, Oct. 2011.
[8] N. Cao, J. Sun, Y.-R. Lin, D. Gotz, S. Liu, and H. Qu, "FacetAtlas: Multifaceted Visualization for Rich Text Corpora," IEEE Trans. Visualization and Computer Graphics, vol. 16, no. 6, pp. 1172-1181, Nov./Dec. 2010.
[9] A.J.B. Chaney and D.M. Blei, "Visualizing Topic Models," Proc. Sixth Int'l AAAI Conf. Weblogs and Social Media (AAAI ICWSM), pp. 419-422, 2012.
[10] J.-K. Chou and C.-K. Yang, "PaperVis: Literature Review Made Easy," Computer Graphics Forum, vol. 30, no. 3, pp. 721-730, 2011.
[11] J. Chuang, D. Ramage, C.D. Manning, and J. Heer, "Interpretation and Trust: Designing Model-Driven Visualizations for Text Analysis," Proc. ACM Conf. Human Factors in Computing Systems (CHI), pp. 443-452, 2012.
[12] C. Collins, S. Carpendale, and G. Penn, "DocuBurst: Visualizing Document Content Using Language Structure," Computer Graphics Forum, vol. 28, no. 3, pp. 1039-1046, 2008.
[13] C. Collins, F.B. Viegas, and M. Wattenberg, "Parallel Tag Clouds to Explore and Analyze Faceted Text Corpora," Proc. IEEE Symp. Visual Analytics Science and Technology (VAST), pp. 91-98, Oct. 2009.
[14] W. Cui, S. Liu, L. Tan, C. Shi, Y. Song, Z.J. Gao, X. Tong, and H. Qu, "TextFlow: Towards Better Understanding of Evolving Topics in Text," IEEE Trans. Visualization and Computer Graphics, vol. 17, no. 12, pp. 2412-2421, Dec. 2011.
[15] H. Cunningham, D. Maynard, K. Bontcheva, V. Tablan, N. Aswani, I. Roberts, G. Gorrell, A. Funk, A. Roberts, D. Damljanovic, T. Heitz, M.A. Greenwood, H. Saggion, J. Petrak, Y. Li, and W. Peters, Text Processing with GATE (Version 6), 2011.
[16] D.R. Cutting, D.R. Karger, J.O. Pedersen, and J.W. Tukey, "Scatter/Gather: A Cluster-Based Approach to Browsing Large Document Collections," Proc. ACM SIGIR 15th Ann. Conf. Conf. Research Development in Information Retrieval, pp. 318-329, 1992.
[17] S. Deerwester, S. Dumais, G. Furnas, T. Landauer, and R. Harshman, "Indexing by Latent Semantic Analysis," J. Soc. Information Science, vol. 41, pp. 391-407, 1990.
[18] I.S. Dhillon and D.S. Modha, "Concept Decompositions for Large Sparse Text Data using Clustering," Machine Learning, vol. 42, no. 1/2, pp. 143-175, 2001.
[19] A. Don, E. Zheleva, M. Gregory, S. Tarkan, L. Auvil, T. Clement, B. Shneiderman, and C. Plaisant, "Discovering Interesting usage Patterns in Text Collections: Integrating Text Mining with Visualization," Proc. ACM Conf. Information and Knowledge Management (CIKM), pp. 213-222, 2007.
[20] W. Dou, X. Wang, R. Chang, and W. Ribarsky, "Parallel Topics: A Probabilistic Approach to Exploring Document Collections," Proc. IEEE Conf. Visual Analytics Science and Technology (VAST), pp. 229-238, Oct. 2011.
[21] S.G. Eick, "Graphically Displaying Text," J. Computational and Graphical Statistics, vol. 3, no. 2, pp. 127-142, 1994.
[22] R. Feldman and J. Sanger, The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge Univ. Press, 2007.
[23] M.J. Gardner, J. Lutes, J. Lund, J. Hansen, D. Walker, E. Ringger, and K. Seppi, "The Topic Browser: An Interactive Tool for Browsing Topic Models," Proc. Neural Information Processing Systems (NIPS) Workshop Challenges of Data Visualization, 2010.
[24] C. Görg, Z. Liu, N. Parekh, K. Singhal, and J. Stasko, "Jigsaw Meets Blue Iguanodon - The VAST 2007 Contest," Proc. IEEE Conf. Visual Analytics Science and Technology (VAST), pp. 235-236, Oct. 2007.
[25] C. Görg, H. Tipney, K. Verspoor, W. Baumgartner, K. Cohen, J. Stasko, and L. Hunter, "Visualization and Language Processing for Supporting Analysis Across the Biomedical Literature," Proc. 14th Int'l Conf. Knowledge-Based and Intelligent Information and Eng. Systems, pp. 420-429, 2010.
[26] M. Gregory, N. Chinchor, P. Whitney, R. Carter, E. Hetzler, and A. Turner, "User-Directed Sentiment Analysis: Visualizing the Affective Content of Documents," Proc. Workshop Sentiment and Subjectivity in Text, pp. 23-30, 2006.
[27] B. Gretarsson, J. O'Donovan, S. Bostandjiev, T. Höllerer, A.U. Asuncion, D. Newman, and P. Smyth, "Topicnets: Visual Analysis of Large Text Corpora with Topic Modeling," ACM Trans. Intelligent Systems and Technology, vol. 3, no. 2, pp. 23:1-23:26, 2012.
[28] S. Havre, B. Hetzler, and L. Nowell, "ThemeRiver: Visualizing Theme Changes over Time," Proc. IEEE Symp. Information Visualization (InfoVis), pp. 115-123, Oct. 2000.
[29] J. He, A.-H. Tan, C.L. Tan, and S.Y. Sung, "On Quantitative Evaluation of Clustering Systems," Clustering and Information Retrieval, pp. 105-134, Springer, 2003.
[30] E. Hetzler and A. Turner, "Analysis Experiences using Information Visualization," IEEE Computer Graphics and Applications, vol. 24, no. 5, pp. 22-26, Sept./Oct. 2004.
[31] "i2 - Analyst's Notebook," http:/www.i2inc.com/, 2013.
[32] D. Jonker, W. Wright, D. Schroh, P. Proulx, and B. Cort, "Information Triage with TRIST," Proc. Int'l Conf. Intelligence Analysis, May 2005.
[33] H. Kang, C. Plaisant, B. Lee, and B.B. Bederson, "NetLens: Iterative Exploration of Content-Actor Network Data," Information Visualization, vol. 6, no. 1, pp. 18-31, 2007.
[34] Y.-a. Kang, C. Görg, and J. Stasko, "How Can Visual Analytics Assist Investigative Analysis? Design Implications from an Evaluation," IEEE Trans. Visualization and Computer Graphics, vol. 17, no. 5, pp. 570-583, May 2011.
[35] Y.-a. Kang and J. Stasko, "Examining the Use of a Visual Analytics System for Sensemaking Tasks: Case Studies with Domain Experts," IEEE Trans. Visualization and Computer Graphics, vol. 18, no. 12, pp. 2869-2878, Dec. 2012.
[36] D. Keim, G. Andrienko, J.-D. Fekete, C. Görg, J. Kohlhammer, and G. Melançon, "Visual Analytics: Definition, Process, and Challenges," Information Visualization: Human-Centered Issues and Perspectives, pp. 154-175, Springer-Verlag, 2008.
[37] Mastering the Information Age - Solving Problems with Visual Analytics, D. Keim, J. Kohlhammer, G. Ellis, and F. Mansmann, eds. Eurographics Assoication, 2010.
[38] D.A. Keim and D. Oelke, "Literature Fingerprinting: A New Method for Visual Literary Analysis," Proc. IEEE Symp. Visual Analytics Science and Technology (VAST), pp. 115-122, 2007.
[39] G. Klein, B. Moon, and R. Hoffman, "Making Sense of Sensemaking 1: Alternative Perspectives," IEEE Intelligent Systems, vol. 21, no. 4, pp. 70-73, July/Aug. 2006.
[40] B. Lee, M. Czerwinski, G. Robertson, and B.B. Bederson, "Understanding Research Trends in Conferences Using Paperlens," Proc. Extended Abstracts ACM Conf. Human Factors in Computing Systems, pp. 1969-1972, 2005.
[41] S. Liu, M.X. Zhou, S. Pan, Y. Song, W. Qian, W. Cai, and X. Lian, "Tiara: Interactive, Topic-Based Visual Text Summarization and Analysis," ACM Trans. Intelligent Systems and Technology, vol. 3, no. 2, pp. 25:1-25:28, Feb. 2012.
[42] Z. Liu, C. Görg, J. Kihm, H. Lee, J. Choo, H. Park, and J. Stasko, "Data Ingestion and Evidence Marshalling in Jigsaw," Proc. IEEE Symp. Visual Analytics Science and Technology (VAST), pp. 271-272, Oct. 2010.
[43] G. Marchionini, "Exploratory Search: From Finding to Understanding," Comm. ACM, vol. 49, no. 4, pp. 41-46, Apr. 2006.
[44] G. Marchionini and R.W. White, "Information-Seeking Support Systems," Computer, vol. 42, no. 3, pp. 30-32, Mar. 2009.
[45] D. Oelke, P. Bak, D. Keim, M. Last, and G. Danon, "Visual Evaluation of Text Features for Document Summarization and Analysis," Proc. IEEE Symp. Visual Analytics Science and Technology (VAST), pp. 75-82, Oct. 2008.
[46] D. Oelke, M. Hao, C. Rohrdantz, D. Keim, U. Dayal, L.-E. Haug, and H. Janetzko, "Visual Opinion Analysis of Customer Feedback Data," Proc. IEEE Symp. Visual Analytics Science and Technology (VAST), pp. 187-194, Oct. 2009.
[47] W.B. Paley, "TextArc: Showing word Frequency and Distribution in Text," Proc. IEEE Symp. Information Visualization (INFOVIS) (Poster), 2002.
[48] B. Pang and L. Lee, "A Sentimental Education: Sentiment Analysis using Subjectivity Summarization Based on Minimum Cuts," Proc. 42nd Ann. Meeting Assoc. for Computational Linguistics, pp. 271-278, 2004.
[49] O.J. Pinon, D.N. Mavris, and E. Garcia, "Harmonizing European and American Aviation Modernization Efforts Through Visual Analytics," J. Aircraft, vol. 48, pp. 1482-1494, Sept./Oct. 2011.
[50] P. Pirolli and S. Card, "The Sensemaking Process and Leverage Points for Analyst Technology as Identified Through Cognitive Task Analysis," Proc. Int'l Conf. Intelligence Analysis, May 2005.
[51] M.F. Porter, "An Algorithm for Suffix Stripping," Program, vol. 14, no. 3, pp. 130-137, 1980.
[52] L. Ratinov and D. Roth, "Design Challenges and Misconceptions in Named Entity Recognition," Proc. Conf. Computational Natural Language Learning (CoNLL), pp. 147-155, 2009.
[53] H. Ruan, C. Anslow, S. Marshall, and J. Noble, "Exploring the Inventor's Paradox: Applying Jigsaw to Software Visualization," Proc. ACM Fifth Int'l Symp. Software Visualization (SOFTVIS), pp. 83-92, Oct. 2010.
[54] D.M. Russell, M.J. Stefik, P. Pirolli, and S.K. Card, "The Cost Structure of Sensemaking," Proc. ACM Conf. Human Factors in Computing Systems (CHI), pp. 269-276, 1993.
[55] J. Stasko, C. Görg, and Z. Liu, "Jigsaw: Supporting Investigative Analysis through Interactive Visualization," Information Visualization, vol. 7, no. 2, pp. 118-132, 2008.
[56] A. Strehl, J. Ghosh, and R. Mooney, "Impact of Similarity Measures on Web-Page Clustering," Proc. Workshop Artificial Intelligence for Web Search (AAAI), pp. 58-64, 2000.
[57] V. Thai, P.-Y. Rouille, and S. Handschuh, "Visual Abstraction and Ordering in Faceted Browsing of Text Collections," ACM Trans. Intelligent Systems and Technology, vol. 3, no. 2, pp. 21:1-21:24, Feb. 2012.
[58] J.J. Thomas and K.A. Cook, Illuminating the Path. IEEE CS Press, 2005.
[59] F. van Ham, M. Wattenberg, and F.B. Viégas, "Mapping Text with Phrase Nets," IEEE Trans. Visualization and Computer Graphics, vol. 15, no. 6, pp. 1169-1176, Nov./Dec. 2009.
[60] F.B. Viégas, S. Golder, and J. Donath, "Visualizing Email Content: Portraying Relationships from Conversational Histories," Proc. ACM SIGCHI Conf. Human Factors in Computing Systems (CHI), pp. 979-988, 2006.
[61] F.B. Viégas, M. Wattenberg, and J. Feinberg, "Participatory Visualization with Wordle," IEEE Trans. Visualization and Computer Graphics, vol. 15, no. 6, pp. 1137-1144, Nov./Dec. 2009.
[62] R. Vuillemot, T. Clement, C. Plaisant, and A. Kumar, "What's Being Said Near 'Martha'? Exploring name entities in literary text collections," Proc. IEEE Symp. Visual Analytics Science and Technology (VAST), pp. 107-114, Oct. 2009.
[63] M. Wattenberg, "Arc Diagrams: Visualizing Structure in Strings," Proc. IEEE Symp. Information Visualization (INFOVIS), pp. 110-116, 2002.
[64] M. Wattenberg and F.B. Viégas, "The Word Tree, an Interactive Visual Concordance," IEEE Trans. Visualization and Computer Graphics, vol. 14, no. 6, pp. 1221-1228, Nov./Dec. 2008.
[65] C. Weaver, "Cross-Filtered Views for Multidimensional Visual Analysis," IEEE Trans. Visualization and Computer Graphics, vol. 16, no. 2, pp. 192-204, Mar. 2010.
[66] R.W. White, B. Kules, S.M. Drucker, and M.C. Schraefel, "Supporting Exploratory Search," Comm. ACM, vol. 49, no. 4, pp. 36-39, Apr. 2006.
[67] W. Wright, D. Schroh, P. Proulx, A. Skaburskis, and B. Cort, "The Sandbox for Analysis: Concepts and Methods," Proc. ACM SIGCHI Conf. Human Factors in Computing Systems (CHI), pp. 801-810, Apr. 2006.
[68] H. Zha, "Generic Summarization and Keyphrase Extraction using Mutual Reinforcement Principle and Sentence Clustering," Proc. ACM 25th Ann. Int'l Conf. Research and Development in Information Retrieval, pp. 113-120, 2002.
[69] C. Görg, Y. Kang, Z. Liu, and J. Stasko, "Visual Analytics Support for Intelligence Analysis," Computer, pp. 30-38, July 2013.
[70] C. Görg, Z. Liu, and J. Stasko, "Reflections on the Evolution of the Jigsaw Visual Analytics System," Information Visualization, 2013, to appear.
87 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool