This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Sequential Document Visualization
November/December 2007 (vol. 13 no. 6)
pp. 1208-1215
Documents and other categorical valued time series are often characterized by the frequencies of short range sequential patterns such as n-grams. This representation converts sequential data of varying lengths to high dimensional histogram vectors which are easily modeled by standard statistical models. Unfortunately, the histogram representation ignores most of the medium and long range sequential dependencies making it unsuitable for visualizing sequential data. We present a novel framework for sequential visualization of discrete categorical time series based on the idea of local statistical modeling. The framework embeds categorical time series as smooth curves in the multinomial simplex summarizing the progression of sequential trends. We discuss several visualization techniques based on the above framework and demonstrate their usefulness for document visualization.

[1] D. Beeferman, A. Berger, and J. D. Lafferty, Statistical models for text segmentation. Machine Learning, 34 (1–3): 177–210, 1999.
[2] D. Blei and J. Lafferty, Dynamic topic models. In Proceedings of the Twenty-Third International Conference on Machine Learning, pages 113–120, 2006.
[3] B. Fortuna, M. Grobelnik, and D. Mladenic, Visualization of text document corpus. Informatica, 29: 497–502, 2005.
[4] T. Hastie, R. Tibshirani, and J. H. Friedman, The Elements of Statistical Learning. Springer, 2003.
[5] S. Havre, E. Hetzler, K. Perrine, E. Jurrus, and N. Miller, Interactive visualization of multiple query results. In IEEE Symposium on Information Visualization, page 105, 2001.
[6] S. Havre, E. Hetzler, P. Whitney, and L. Nowell, Themeriver: Visualizing thematic changes in large document collections. IEEE Transactions on Visualization and Computer Graphics, 8 (1): 9–20, 2002.
[7] M. A. Hearst, Multi-paragraph segmentation of expository text. In Association of Computational Linguistics, pages 9–16, 1994.
[8] M. A. Hearst, Texttiling: Segmenting text into multi-paragraph subtopic passages. Computational Linguistics, 23 (1): 33–64, 1997.
[9] H. Hochheiser and B. Shneiderman, Dynamic query tools for time series data sets, timebox widgets for interactive exploration. Information Visualization, 3 (1): 1–18, 2004.
[10] G. Lebanon, Sequential document representations and simplicial curves. In Proc. of the 22nd Conference on Uncertainty in Artificial Intelligence, 2006.
[11] C. Loader, Local Regression and Likelihood. Springer, 1999.
[12] S. Mallat, A Wavelet Tour of Signal Processing. Academic Press, 1999.
[13] C. D. Manning and H. Schutze, Foundations of Statistical Natural Language Processing. MIT Press, 1999.
[14] N. E. Miller, P. C. Wong, M. Brewster, and H. Foote, Topic islands - a wavelet based text visualization system. In IEEE International Conference on Visualization, pages 189–196, 1998.
[15] J. Ramsay and B. W. Silverman, Functional Data Analysis. Springer, second edition, 2005.
[16] G. Salton, J. Allen, C. Buckley, and A. Singhal, Automatic analysis, theme generation, and summarization of machine-readable texts. Readings in Information Retrieval, pages 478–483, 1997.
[17] G. Salton, A. Singhal, C. Buckley, and M. Mitra, Automatic text decomposition using text segments and text themes. In UK Conference on Hypertext, pages 53–65, 1996.
[18] A. Spoerri, Infocrystal: A visual tool for information retrieval & management. In International Conference on Information and Knowledge Management, pages 11–20, 1993.
[19] S. Teufel and M. Moens, Summarizing scientific articles: experiments with relevance and rhetorical status. Computational Linguistics, 28 (4): 409–445, 2002.
[20] J. J. Thomas and K. A. Cook, editors. Illuminating the Path. IEEE Computer Society, 2005.
[21] F. B. Viégas, S. Golder, and J. Donath, Visualizing email content: portraying relationships from conversational histories. In Proceedings of the SIGCHI conference on Human Factors in computing systems, pages 979–988, 2006.
[22] M. Weber, M. Alexa, and W. Muller, Visualizing time-series on spirals. In Proc. of IEEE Symposium on Information Visualization, pages 7–14, 2001.
[23] P. C. Wong, W. Cowley, H. Foote, E. Jurrus, and J. Thomas, Visualizing sequential patterns for text mining. In Proc. of IEEE Symposium on Information Visualization, pages 1–5, 2000.

Index Terms:
Document visualization, multi-resolution analysis, local fitting.
Citation:
Yi Mao, Joshua Dillon, Guy Lebanon, "Sequential Document Visualization," IEEE Transactions on Visualization and Computer Graphics, vol. 13, no. 6, pp. 1208-1215, Nov.-Dec. 2007, doi:10.1109/TVCG.2007.70592
Usage of this product signifies your acceptance of the Terms of Use.