This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
WORDGRAPH: Keyword-in-Context Visualization for NETSPEAK's Wildcard Search
Sept. 2012 (vol. 18 no. 9)
pp. 1411-1423
M. Trenkmann, Web Technol. & Inf. Syst. Group, Bauhaus-Univ. Weimar, Weimar, Germany
M. Potthast, Web Technol. & Inf. Syst. Group, Bauhaus-Univ. Weimar, Weimar, Germany
H. Gruendl, Virtual Reality Syst. Group, Bauhaus-Univ. Weimar, Weimar, Germany
P. Riehmann, Virtual Reality Syst. Group, Bauhaus-Univ. Weimar, Weimar, Germany
B. Stein, Web Technol. & Inf. Syst. Group, Bauhaus-Univ. Weimar, Weimar, Germany
B. Froehlich, Virtual Reality Syst. Group, Bauhaus-Univ. Weimar, Weimar, Germany
The WORDGRAPH helps writers in visually choosing phrases while writing a text. It checks for the commonness of phrases and allows for the retrieval of alternatives by means of wildcard queries. To support such queries, we implement a scalable retrieval engine, which returns high-quality results within milliseconds using a probabilistic retrieval strategy. The results are displayed as WORDGRAPH visualization or as a textual list. The graphical interface provides an effective means for interactive exploration of search results using filter techniques, query expansion, and navigation. Our observations indicate that, of three investigated retrieval tasks, the textual interface is sufficient for the phrase verification task, wherein both interfaces support context-sensitive word choice, and the WORDGRAPH best supports the exploration of a phrase's context or the underlying corpus. Our user study confirms these observations and shows that WORDGRAPH is generally the preferred interface over the textual result list for queries containing multiple wildcards.

[1] F. van Ham, M. Wattenberg, and F. Viégas, "Mapping Text with Phrase Nets," IEEE Trans. Visualization and Computer Graphics, vol. 15, no. 06, pp. 1169-1176, Nov./Dec. 2009.
[2] M. Wattenberg and F.B. Viégas, "The Word Tree, An Interactive Visual Concordance," IEEE Trans. Visualization and Computer Graphics, vol. 14, no. 6, pp. 1221-1228, Nov./Dec. 2008.
[3] GoogleLabs, "Google Scribe," http:/scribe.googlelabs.com/, 2012.
[4] T. Park, E. Lank, P. Poupart, and M. Terry, "Is the Sky Pure Today? Awkchecker: An Assistive Tool for Detecting and Correcting Collocation Errors," Proc. 21st Ann. ACM Symp. User Interface Software and Technology (UIST '08), pp. 121-130, 2008.
[5] P. Riehmann, H. Gruendl, B. Fröhlich, M. Potthast, M. Trenkmann, and B. Stein, "The NETSPEAK WORDGRAPH: Visualizing Keywords in Context," Proc. IEEE Pacific Visualization Symp. (PacificVis), pp. 123-130, Mar. 2011.
[6] F. Viegas and M. Wattenberg, "Web Seer," http://hint.fm/projectsseer/, 2012.
[7] W.B. Paley, "Textarc: Showing Word Frequency and Distribution in Text," Poster Infovis, http://www.textarc.org/appearances/InfoVis02 InfoVis02_TextArc.pdf, 2002.
[8] C. Harrsion, "Web Trigrams," http://www.chrisharrison.net/projectsvisualization.html , 2012.
[9] C. Collins, M.S.T. Carpendale, and G. Penn, "Visualization of Uncertainty in Lattices to Support Decision-Making," Proc. EuroVis, pp. 51-58. 2007,
[10] J. Heer and S.K. Card, "Doitrees Revisited: Scalable, Space-Constrained Visualization of Hierarchical Data," Proc. Working Conf. Advanced Visual Interfaces (AVI '04), pp. 421-424, 2004.
[11] C. Plaisant, J. Grosjean, and B.B. Bederson, "Spacetree: Supporting Exploration in Large Node Link Tree, Design Evolution and Empirical Evaluation," Proc. IEEE Symp. Information Visualization (InfoVis '02), p. 57, 2002.
[12] C.D. Manning and H. Schütze, Foundations of Statistical Natural Language Processing. The MIT Press, 1999.
[13] C. Leacock, M. Chodorow, M. Gamon, and J. Tetreault, Automated Grammatical Error Detection for Language Learners. Morgan and Claypool Publishers, 2010.
[14] M. Hagen, M. Potthast, B. Stein, and C. Bräutigam, "Query Segmentation Revisited," Proc. 20th Int'l Conf. World Wide Web (WWW '11), S. Srinivasan, K. Ramamritham, A. Kumar, M. Ravindra, E. Bertino, and R. Kumar, eds., pp. 97-106, Mar. 2011.
[15] J.-B. Michel, Y. Shen, A. Aiden, A. Veres, M. Gray The Google Books Team, J. Pickett, D. Hoiberg, D. Clancy, P. Norvig, J. Orwant, S. Pinker, M. Nowak, and E. Aiden, "Quantitative Analysis of Culture Using Millions of Digitized Books," Science, vol. 331, no. 6014, pp. 176-182, http://www.isrl.uiuc.edu/~amag/langev/paper michel2011googleBooksSCIENCE.html, Jan. 2011.
[16] W.H. Fletcher, "Web as Corpus," http:/www.webascorpus.org/, 2012.
[17] Research and Development Unit for English Studies, "Webcorp Live," http:/www.webcorp.org.uk/, 2012.
[18] W.H. Fletcher, "Phrases in English," http:/www.phrasesin english.org/, 2012.
[19] P. Resnik and A. Elkiss, "The Linguist's Search Engine: an Overview," Proc. ACL Interactive Poster and Demonstration Sessions (ACL '05), pp. 33-36, 2005.
[20] M.J. Cafarella and O. Etzioni, "A Search Engine for Natural Language Applications," Proc. 14th Int'l Conf. World Wide Web (WWW '05), pp. 442-452, 2005.
[21] P. Chubak and D. Rafiei, "Index Structures for Efficiently Searching Natural Language Text," Proc. 19th ACM Int'l Conf. Information and Knowledge Management, pp. 689-698, 2010.
[22] Webis Group at Bauhaus-Universität Weimar, "Netspeak Writing Assistance," http:/netspeak.cc, 2012.
[23] K. Sugiyama, S. Tagawa, and M. Toda, "Methods for Visual Understanding of Hierarchical System Structures," IEEE Trans. Systems, Man, and Cybernetics, vol. SMC-11, no. 2, pp. 109-125, Feb. 1981.
[24] D. Belazzougui, F. Botelho, and M. Dietzfelbinger, "Hash, Displace, and Compress," Proc. 17th European Symp. Algorithms (ESA '09), pp. 682-693, 2009.
[25] Scenario, Project Scene Graph, https:/scenegraph.dev.java.net/, 2012.
[26] T. Brants and A. Franz, "Web 1T 5-Gram Version 1," Linguistic Data Consortium LDC2006T13, 2006.
[27] R.W. White and D. Morris, "Investigating the Querying and Browsing Behavior of Advanced Search Engine Users," Proc. 30th ACM SIGIR Conf., pp. 255-262, 2007.

Index Terms:
probability,data visualisation,information retrieval,underlying corpus,WORDGRAPH,keyword-in-context visualization,NETSPEAK wildcard search,visually choosing phrases,wildcard queries,scalable retrieval engine,probabilistic retrieval strategy,graphical interface,interactive exploration,filter techniques,query expansion,query navigation,phrase context,Visualization,Google,Navigation,Engines,Layout,Indexes,wildcard search.,Information visualization,visual queries,text visualization,information retrieval,Web n-grams
Citation:
M. Trenkmann, M. Potthast, H. Gruendl, P. Riehmann, B. Stein, B. Froehlich, "WORDGRAPH: Keyword-in-Context Visualization for NETSPEAK's Wildcard Search," IEEE Transactions on Visualization and Computer Graphics, vol. 18, no. 9, pp. 1411-1423, Sept. 2012, doi:10.1109/TVCG.2012.96
Usage of this product signifies your acceptance of the Terms of Use.