The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.12 - Dec. (2011 vol.17)
pp: 2412-2421
Shixia Liu , Microsoft Research Asia
Li Tan , Microsoft Research Asia
Conglei Shi , Hong Kong University of Science and Technology
Yangqiu Song , Microsoft Research Asia
Zekai Gao , Zhejiang University ∕ Microsoft Research Asia
Huamin Qu , Hong Kong University of Science and Technology
Xin Tong , Microsoft Research Asia
ABSTRACT
Understanding how topics evolve in text data is an important and challenging task. Although much work has been devoted to topic analysis, the study of topic evolution has largely been limited to individual topics. In this paper, we introduce TextFlow, a seamless integration of visualization and topic mining techniques, for analyzing various evolution patterns that emerge from multiple topics. We first extend an existing analysis technique to extract three-level features: the topic evolution trend, the critical event, and the keyword correlation. Then a coherent visualization that consists of three new visual components is designed to convey complex relationships between them. Through interaction, the topic mining model and visualization can communicate with each other to help users refine the analysis result and gain insights into the data progressively. Finally, two case studies are conducted to demonstrate the effectiveness and usefulness of TextFlow in helping users understand the major topic evolution patterns in time-varying text data.
INDEX TERMS
Text visualization, Topic evolution, Hierarchical Dirichlet process, Critical event.
CITATION
Shixia Liu, Li Tan, Conglei Shi, Yangqiu Song, Zekai Gao, Huamin Qu, Xin Tong, "TextFlow: Towards Better Understanding of Evolving Topics in Text", IEEE Transactions on Visualization & Computer Graphics, vol.17, no. 12, pp. 2412-2421, Dec. 2011, doi:10.1109/TVCG.2011.239
REFERENCES
[1] J. Allan editor. Topic detection and tracking: event-based information organization. Kluwer Academic Publishers, Norwell, MA, USA, 2002.
[2] D. M. Blei and J. D. Lafferty, Dynamic topic models. In Proceedings of the 23rd international conference on Machine learning, ICML '06, pages 113–120, New York, NY, USA, 2006. ACM.
[3] D. M. Blei, A. Y. Ng, and M. I. Jordan, Latent dirichlet allocation. J. Mach Learn. Res., 3: 993–1022, March 2003.
[4] L. Byron and M. Wattenberg, Stacked Graphs–Geometry & Aesthetics. IEEE Transactions on Visualization and Computer Graphics, 14 (6): 1245–1252, 2008.
[5] D. Chakrabarti, R. Kumar, and A. Tomkins, Evolutionary clustering. In Proceedings of the 12nd ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '06, pages 554–560, New York, NY, USA, 2006. ACM.
[6] Y. Chi, X. Song, D. Zhou, K. Hino, and B. L. Tseng, Evolutionary spectral clustering by incorporating temporal smoothness. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '07, pages 153–162, New York, NY, USA, 2007. ACM.
[7] C. Collins, F. Viegas, and M. Wattenberg, Parallel tag clouds to explore and analyze faceted text corpora. In Proceedings of the IEEE Symposium on Visual Analytics Science and Technology (VAST), ppages 91–98. IEEE, 2009.
[8] W. Cui, Y. Wu, S. Liu, F. Wei, M. X. Zhou, and H. Qu, Context-preserving, dynamic word cloud visualization. IEEE Computer Graphics and Applications, 30 (6): 42–53, 2010.
[9] M. Dörk, D. Gruen, C. Williamson, and S. Carpendale, A visual backchannel for large-scale events. IEEE Transactions on Visualization and Computer Graphics, 16 (6): 1129–1138, 2010.
[10] M. Dubinko, R. Kumar, J. Magnani, J. Novak, P. Raghavan, and A. Tomkins, Visualizing tags over time. In Proceedings of the 15th international conference on World Wide Web, WWW '06, pages 193–202, New York, NY, USA, 2006. ACM.
[11] T. Dwyer, Y. Koren, and K. Marriott, IPSep-CoLa: An incremental procedure for separation constraint layout of graphs. IEEE Transactions on Visualization and Computer Graphics, 12 (5): 821–828, 2006.
[12] P. Eades, B. McKay, and N. Wormald, On an edge crossing problem. In Proceedings on 9th Australian Computer Science Conference, pages 327–334, Australian National University, 1986.
[13] P. Eades and N. Wormald, The median heuristic for drawing 2-layers networks. In Technical Report 69, Dept. of Computer Science, Univ. of Queensland, 1986.
[14] D. Fisher, A. Hoff, G. G. Robertson, and M. Hurst, Narratives: A visualization to track narrative events as they develop. In Proceedings of the IEEE Symposium on Visual Analytics Science and Technology (VAST), pages 115–122, 2008.
[15] E. R. Gansner, E. Koutsofios, S. C. North, and K.-P. Vo, A technique for drawing directed graphs. IEEE Transactions on Software Engineering, 19 (3): 214–230, 1993.
[16] S. Havre, E. Hetzler, P. Whitney, and L. Nowell, Themeriver: Visualizing thematic changes in large document collections. IEEE Transactions on Visualization and Computer Graphics, 8 (1): 9–20, 2002.
[17] J. Leskovec, L. Backstrom, and J. Kleinberg, Meme-tracking and the dynamics of the news cycle. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '09, pages 497–506, New York, NY, USA, 2009. ACM.
[18] S. Liu, M. X. Zhou, S. Pan, W. Qian, W. Cai, and X. Lian, Interactive, topic-based visual text summarization and analysis. In Proceeding of the 18th ACM conference on Information and knowledge management, CIKM '09, pages 543–552, New York, NY, USA, 2009. ACM.
[19] P. Riehmann, M. Hanfler, and B. Froehlich, Interactive sankey diagrams. In Proceedings of the IEEE Symposium on Information Visualization, pages 31–39, Washington, DC, USA, 2005. IEEE Computer Society.
[20] S. Rose, S. Butner, W. Cowley, M. Gregory, and J. Walker, Describing story evolution from dynamic information streams. In Proceedings of the IEEE Symposium on Visual Analytics Science and Technology (VAST), pages 99–106. IEEE, 2009.
[21] L. Shi, F. Wei, S. Liu, L. Tan, X. Lian, and M. X. Zhou, Understanding text corpora with multiple facets. In Proceedings of the IEEE Symposium on Visual Analytics Science and Technology (VAST), pages 99–106. IEEE, 2010.
[22] Y. Song, S. Pan, S. Liu, M. X. Zhou, and W. Qian, Topic and keyword re-ranking for lda-based topic modeling. In Proceeding of the 18th ACM conference on Information and knowledge management, CIKM '09, pages 1757–1760, New York, NY, USA, 2009. ACM.
[23] Y. Teh, M. Jordan, M. Beal, and D. Blei, Hierarchical dirichlet processes. Journal of the American Statistical Association, 101 (476): 1566–1581, 2006.
[24] F. B. Viégas, S. Golder, and J. Donath, Visualizing email content: portraying relationships from conversational histories. In Proceedings of the SIGCHI conference on Human Factors in computing systems, CHI '06, pages 979–988, New York, NY, USA, 2006. ACM.
[25] F. B. Viégas and M. Wattenberg, Timelines - tag clouds and the case for vernacular visualization. Interactions, 15 (4): 49–52, 2008.
[26] F. B. Viegas, M. Wattenberg, and J. Feinberg, Participatory visualization with wordle. IEEE Transactions on Visualization and Computer Graphics, 15: 1137–1144, November 2009.
[27] C. Wang, D. M. Blei, and D. Heckerman, Continuous time dynamic topic models. In Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence, UAI '08, pages 579–586, 2008.
[28] X. Wang and A. McCallum, Topics over time: a non-markov continuous-time model of topical trends. In Proceedings of the 12nd ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '06, pages 424–433, 2006.
[29] M. Wattenberg, Baby names, visualization, and social data analysis. In Proceedings of the Proceedings of the IEEE Symposium on Information Visualization, pages 1–7, Washington, DC, USA, 2005. IEEE Computer Society.
[30] F. Wei, S. Liu, Y. Song, S. Pan, M. X. Zhou, W. Qian, L. Shi, L. Tan, and Q. Zhang, Tiara: a visual exploratory text analytic system. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '10, pages 153–162, New York, NY, USA, 2010. ACM.
[31] T. Xu, Z. M. Zhang, P. S. Yu, and B. Long, Dirichlet process based evolutionary clustering. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, pages 648–657, Washington, DC, USA, 2008. IEEE Computer Society.
[32] T. Xu, Z. M. Zhang, P. S. Yu, and B. Long, Evolutionary clustering by hierarchical dirichlet process with hidden markov state. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, pages 658–667, Washington, DC, USA, 2008. IEEE Computer Society.
[33] J. Zhang, Z. Ghahramani, and Y. Yang, A probabilistic model for online document clustering with application to novelty detection. In L. K. Saul, Y. Weiss, and L. Bottou editors, NIPS, pages 1617–1624. 2005.
[34] J. Zhang, Y. Song, G. Chen, and C. Zhang, On-line evolutionary exponential family mixture. In Proceedings of the 21st international joint conference on Artificial intelligence, pages 1610–1615, San Francisco, CA, USA, 2009. Morgan Kaufmann Publishers Inc.
[35] J. Zhang, Y. Song, C. Zhang, and S. Liu, Evolutionary hierarchical dirichlet processes for multiple correlated time-varying corpora. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '10, pages 1079–1088, New York, NY, USA, 2010. ACM.
23 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool