The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.05 - May (2012 vol.18)
pp: 662-674
D. Oelke , Dept. of Comput. & Inf. Sci., Univ. of Konstanz, Konstanz, Germany
D. Spretke , Dept. of Comput. & Inf. Sci., Univ. of Konstanz, Konstanz, Germany
A. Stoffel , Dept. of Comput. & Inf. Sci., Univ. of Konstanz, Konstanz, Germany
D. A. Keim , Dept. of Comput. & Inf. Sci., Univ. of Konstanz, Konstanz, Germany
ABSTRACT
We present a tool that is specifically designed to support a writer in revising a draft version of a document. In addition to showing which paragraphs and sentences are difficult to read and understand, we assist the reader in understanding why this is the case. This requires features that are expressive predictors of readability, and are also semantically understandable. In the first part of the paper, we, therefore, discuss a semiautomatic feature selection approach that is used to choose appropriate measures from a collection of 141 candidate readability features. In the second part, we present the visual analysis tool VisRA, which allows the user to analyze the feature values across the text and within single sentences. Users can choose between different visual representations accounting for differences in the size of the documents and the availability of information about the physical and logical layout of the documents. We put special emphasis on providing as much transparency as possible to ensure that the user can purposefully improve the readability of a sentence. Several case studies are presented that show the wide range of applicability of our tool. Furthermore, an in-depth evaluation assesses the quality of the measure and investigates how well users do in revising a text with the help of the tool.
INDEX TERMS
text analysis, learning (artificial intelligence), document processing, visual readability analysis, semiautomatic feature selection approach, visual analysis tool, visual representations, VisRA, draft version, text processing, Vocabulary, Correlation, Training data, Length measurement, Navigation, Visual analytics, feature evaluation and selection., Document and text processing
CITATION
D. Oelke, D. Spretke, A. Stoffel, D. A. Keim, "Visual Readability Analysis: How to Make Your Writings Easier to Read", IEEE Transactions on Visualization & Computer Graphics, vol.18, no. 5, pp. 662-674, May 2012, doi:10.1109/TVCG.2011.266
REFERENCES
[1] "Online Resource of the Multilingual Information Processing Dept. at the Univ. of Geneva, 07/14/10," http://www.issco. unige.ch/en/research/projects/ isle/femti/html182.html, 2011.
[2] J.P. Kincaid, R.P. Fishburn, R.L. Rogers, and B.S. Chissom, "Derivation of New Readability Formulas for Navy Enlisted Personnel," Naval Air Station Memphis, Research Branch Report, pp. 8-75, 1975.
[3] R.F. Flesch, "A New Readability Yardstick," J. Applied Psychology, vol. 32, pp. 221-233, 1948.
[4] H.G. McLaughlin, "SMOG Grading—A New Readability Formula," J. Reading, vol. 12, no. 8, pp. 639-646, 1969.
[5] M. Coleman and T. Liau, "A Computer Readability Formula Designed for Machine Scoring," J. Applied Psychology, vol. 60, no. 2, pp. 283-284, 1975.
[6] R. Gunning, The Technique of Clear Writing, fourth ed. McGraw-Hill, 1952.
[7] M. Heilman, K. Collins-Thompson, J. Callan, and M. Eskenazi, "Combining Lexical and Grammatical Features to Improve Readability Measures for First and Second Language Texts," Proc. Human Language Technologies Conf. North Am. Chapter of the Assoc. for Computational Linguistics (HLT-NAACL), pp. 460-467, 2007.
[8] S.E. Schwarm and M. Ostendorf, "Reading Level Assessment Using Support Vector Machines and Statistical Language Models," ACL '05: Proc. 43rd Ann. Meeting on Assoc. for Computational Linguistics, pp. 523-530, 2005.
[9] K. Collins-Thompson and J. Callan, "A Language Modeling Approach to Predicting Reading Difficulty," Proc. Human Language Technologies Conf. North Am. Chapter of the Assoc. for Computational Linguistics (HLT-NAACL), 2004.
[10] L. Si and J. Callan, "A Statistical Model for Scientific Readability," CIKM '01: Proc. 10th Int'l Conf. Information and Knowledge Management, pp. 574-576, 2001.
[11] R. Barzilay and M. Lapata, "Modeling Local Coherence: An Entity-Based Approach," ACL '05: Proc. 43rd Ann. Meeting on Assoc. for Computational Linguistics, pp. 141-148, 2005.
[12] E. Pitler and A. Nenkova, "Revisiting Readability: A Unified Framework for Predicting Text Quality," EMNLP '08: Proc. Conf. Empirical Methods in Natural Language Processing, pp. 186-195, 2008.
[13] J. Chae and A. Nenkova, "Predicting the Fluency of Text with Shallow Structural Features: Case Studies of Machine Translation and Human-Written Text," EACL '09: Proc. 12th Conf. European Chapter of the Assoc. for Computational Linguistics, pp. 139-147, 2009.
[14] D.A. Keim and D. Oelke, "Literature Fingerprinting: A New Method for Visual Literary Analysis," VAST '07: Proc. IEEE Symp. Visual Analytics and Technology, pp. 115-122, 2007.
[15] T. Ball and S.G. Eick, "Software Visualization in the Large," Computer, vol. 29, no. 4, pp. 33-43, Apr. 1996.
[16] M.A. Hearst, "TileBars: Visualization of Term Distribution Information in Full Text Information Access," CHI '95: Proc Conf. Human Factors in Computing Systems, 1995.
[17] A. Don, E. Zheleva, M. Gregory, S. Tarkan, L. Auvil, T. Clement, B. Shneiderman, and C. Plaisant, "Discovering Interesting Usage Patterns in Text Collections: Integrating Text Mining with Visualization," CIKM '07: Proc. 16th ACM Conf. Information and Knowledge Management, pp. 213-222, 2007.
[18] A. Abbasi and H. Chen, "Categorization and Analysis of Text in Computer Mediated Communication Archives Using Visualization," JCDL '07: Proc. Conf. Digital Libraries, pp. 11-18, 2007.
[19] J.-D. Fekete and N. Dufournaud, "Compus: Visualization and Analysis of Structured Documents for Understanding Social Life in the 16th Century," DL '00: Proc. Fifth ACM Conf. Digital Libraries, pp. 47-55, 2000.
[20] A. Woodruff, A. Faulring, R. Rosenholtz, J. Morrsion, and P. Pirolli, "Using Thumbnails to Search the Web," CHI '01: Proc. SIGCHI Conf. Human Factors in Computing Systems, pp. 198-205, 2001.
[21] B. Suh, A. Woodruff, R. Rosenholtz, and A. Glass, "Popout Prism: Adding Perceptual Principles to Overview+Detail Document Interfaces," CHI '02: Proc. SIGCHI Conf. Human Factors in Computing Systems, pp. 251-258, 2002.
[22] A. Cockburn, C. Gutwin, and J. Alexander, "Faster Document Navigation with Space-Filling Thumbnails," CHI '06: Proc. SIGCHI Conf. Human Factors in Computing Systems, pp. 1-10, 2006.
[23] K. Toutanova and C.D. Manning, "Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger," EMNLP/VLC '00: Proc. Joint SIGDAT Conf. Empirical Methods in Natural Language Processing and Very Large Corpora, pp. 63-70, 2000.
[24] D. Klein and C.D. Manning, "Fast Exact Inference with a Factored Model for Natural Language Parsing," Advances in Neural Information Processing Systems, vol. 15, MIT Press, 2003.
[25] "Dictionary of the Most Frequent Words in the Project Gutenberg, 03/29/2010," http://en.wiktionary.org/wiki/Wiktionary: Frequency_lists/ PG/2006/041-1 0000, 2011.
[26] "Dictionary of the Most Frequent Words in the Project Wortschatz Universität Leipzig, 03/29/2010," http://wortschatz.uni-leipzig. de/htmlwliste.html , 2011.
[27] V.H. Yngve, "A Model and an Hypothesis for Language Structure," Proc. Am. Philosophical Soc., vol. 104, no. 5, 1960.
[28] A. Stoffel, D. Spretke, H. Kinnemann, and D. Keim, "Enhancing Document Structure Analysis Using Visual Analytics," Proc. ACM Symp. Applied Computing, 2010.
[29] M. Billig, "The Language of Critical Discourse Analysis: The Case of Nominalization," Discourse and Soc., vol. 19, no. 6, pp. 783-800, 2008.
[30] "SurveyMonkey, 07/06/2011," http:/surveymonkey.com, 2011.
[31] "SIG-IRList, 07/06/2011." http://www.sigir.org/sigirlist index.html , 2011.
21 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool