This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
TSCAN: A Content Anatomy Approach to Temporal Topic Summarization
January 2012 (vol. 24 no. 1)
pp. 170-183
Chien Chin Chen, National Taiwan University, Taipei
Meng Chang Chen, Academia Sinica, Taipei
A topic is defined as a seminal event or activity along with all directly related events and activities. It is represented by a chronological sequence of documents published by different authors on the Internet. In this study, we define a task called topic anatomy, which summarizes and associates the core parts of a topic temporally so that readers can understand the content easily. The proposed topic anatomy model, called TSCAN, derives the major themes of a topic from the eigenvectors of a temporal block association matrix. Then, the significant events of the themes and their summaries are extracted by examining the constitution of the eigenvectors. Finally, the extracted events are associated through their temporal closeness and context similarity to form an evolution graph of the topic. Experiments based on the official TDT4 corpus demonstrate that the generated temporal summaries present the storylines of topics in a comprehensible form. Moreover, in terms of content coverage, coherence, and consistency, the summaries are superior to those derived by existing summarization methods based on human-composed reference summaries.

[1] J. Allan, J. Carbonell, G. Doddington, J. Yamron, and Y. Yang, "Topic Detection and Tracking Pilot Study: Final Report," Proc. US Defense Advanced Research Projects Agency (DARPA) Broadcast News Transcription and Understanding Workshop, pp. 194-218, 1998.
[2] V. Hatzivassiloglou, L. Gravano, and A. Maganti, "An Investigation of Linguistic Features and Clustering Algorithms for Topical Document Clustering," Proc. 23rd Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 224-231, 2000.
[3] C.D. Manning, P. Raghavan, and H. Schutze, Introduction to Information Retrieval. Cambridge Univ. Press, 2008.
[4] Y. Yang, T. Pierce, and J. Carbonell, "A Study on Retrospective and Online Event Detection," Proc. 21st Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 28-36, 1998.
[5] C.C. Chen, M.C. Chen, and M.S. Chen, "An Adaptive Threshold Framework for Event Detection Using HMM-Based Life Profiles," ACM Trans. Information Systems, vol. 27, no. 2, pp. 1-35, 2009.
[6] Q. Mei and C.X. Zhai, "Discovering Evolutionary Theme Patterns from Text—An Exploration of Temporal Text Mining," Proc. 11th ACM SIGKDD Int'l Conf. Knowledge Discovery in Data Mining, pp. 198-207, 2005.
[7] S. Strassel and M. Glenn, "Creating the Annotated TDT4 Y2003 Evaluation Corpus," http://www.itl.nist.gov/iad/mig/tests/ tdt/ 2003/papersldc.ppt, 2003.
[8] L E. Spence, A.J. Insel, and S.H. Friedberg, Elementary Linear Algebra, a Matrix Approach. Prentice Hall, 2000.
[9] M.A. Hearst and C. Plaunt, "Subtopic Structuring for Full-Length Document Access," Proc. 16th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 59-68, 1993.
[10] X. Ji and H. Zha, "Domain-Independent Text Segmentation Using Anisotropic Diffusion and Dynamic Programming," Proc. 26th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 322-329, 2003.
[11] T. Brants, F. Chen, and I. Tsochantaridis, "Topic-Based Document Segmentation with Probabilistic Latent Semantic Analysis," Proc. 11th Int'l Conf. Information and Knowledge Management, pp. 211-218, 2002.
[12] F.Y.Y. Choi, P. Wiemer-Hastings, and J. Moore, "Latent Semantic Analysis for Text Segmentation," Proc. Conf. Empirical Methods in Natural Language Processing, pp. 109-117, 2001.
[13] D.M. Blei and P.J. Moreno, "Topic Segmentation with an Aspect Hidden Markov Model," Proc. 24th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 343-348, 2001.
[14] Y. Gong and X. Liu, "Generic Text Summarization Using Relevance Measure and Latent Semantic Analysis," Proc. 24th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 19-25, 2001.
[15] D. Shen, J.T. Sun, H. Li, Q. Yang, and Z. Chen, "Document Summarization Using Conditional Random Fields," Proc. 20th Int'l Joint Conf. Artificial Intelligence (IJCAI '07), pp. 2862-2867, 2007.
[16] T. Nomoto and Y. Matsumoto, "A New Approach to Unsupervised Text Summarization," Proc. 24th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 26-34, 2001.
[17] H. Zha, "Generic Summarization and Keyphrase Extraction Using Mutual Reinforcement Principle and Sentence Clustering," Proc. 25th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 113-120, 2002.
[18] G. Salton, A. Singhal, M. Mitra, and C. Buckley, "Automatic Text Structuring and Summarization," Advances in Automatic Text Summarization, The MIT Press, 1999.
[19] J. Allan, R. Gupta, and V. Khandelwal, "Temporal Summaries of News Topic," Proc. 24th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 10-18, 2001.
[20] G. Erkan and D.R. Radev, "LexRank: Graph-Based Centrality as Salience in Text Summarization," J. Artificial Intelligence Research, vol. 22, pp. 457-479, 2004.
[21] R. Mihalcea and P. Tarau, "A Language Independent Algorithm for Single and Multiple Document Summarization," Proc. Int'l Joint Conf. Natural Language Processing, pp. 19-24, 2005.
[22] J. Kleinberg, "Authoritative Sources in a Hyperlinked Environment," Proc. Ninth Ann. ACM-SIAM Symp. Discrete Algorithms, pp. 668-677, 1998.
[23] S. Brin and L. Page, "The Anatomy of a Large-Scale HyperTextual Web Search Engine," Computer Networks and ISDN Systems Archive, vol. 30, nos. 1-7, pp. 107-117, 1998.
[24] J-T. Sun, D. Shen, H-J. Zeng, Q. Yang, Y. Lu, and Z. Chen, "Web-Page Summarization Using Clickthrough Data," Proc. 28th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 194-201, 2005.
[25] A. Nenkova, L. Vanderwende, and K. Mckeown, "A Compositional Context Sensitive Multi-Document Summarizer: Exploring the Factors that Influence Summarization," Proc. 29th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 573-580, 2006.
[26] J. Kleinberg, "Bursty and Hierarchical Structure in Streams," Proc. Eighth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 91-101, 2002.
[27] R. Nallapati, A. Feng, F. Peng, and J. Allan, "Event Threading within News Topics," Proc. 13th ACM Int'l Conf. Information and Knowledge Management, pp. 446-453, 2004.
[28] C.C. Yang and X. Shi, "Discovering Event Evolution Graphs from Newswires," Proc. 15th Int'l Conf. World Wide Web, pp. 945-946, 2006.
[29] A. Feng and J. Allan, "Finding and Linking Incidents in News," Proc. 16th ACM Conf. Information and Knowledge Management, pp. 821-830, 2007.
[30] R. Swan and J. Allan, "Automatic Generation of Overview Timelines," Proc. 23rd Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 49-56, 2000.
[31] R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval. Addison Wesley, 1999.
[32] W.L. Winston, Operations Research. Thomson, 2004.
[33] T. Hofmann, "Probabilistic Latent Semantic Indexing," Proc. 22nd Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 50-57, 1999.
[34] C. Nicholas and R. Dahlberg, "Spotting Topics with the Singular Value Decomposition," Proc. Fourth Int'l Workshop Principles of Digital Document Processing, pp. 82-91, 1998.
[35] L.R. Rabiner and M.R. Sambur, "An Algorithm for Determining the Endpoints for Isolated Utterances," the Bell System Technical J., vol. 54, no. 2, pp 297-315, Feb. 1975.
[36] L.R. Rabiner and R.W. Schafer, Digital Processing of Speech Signals. Prentice-Hall, 1978.
[37] C.C. Chen and M.C. Chen, "TSCAN: A Novel Method for Topic Summarization and Content Anatomy," Proc. 31st Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 579-586, 2008.
[38] C.Y. Lin and E. Hovy, "Automatic Evaluation of Summaries Using N-gram Co-Occurrence Statistics," Proc. Conf. the North Am. Chapter of the Assoc. for Computational Linguistics on Human Language Technology, vol. 1, pp. 71-78, 2003.
[39] A. Nenkova, "Automatic Text Summarization of Newswire: Lessons Learned from the Document Understanding Conference," Proc. 20th Nat'l Conf. Artificial Intelligence (AAAI), pp. 1436-1441, 2005.
[40] J. Ruan and W. Zhang, "An Efficient Spectral Algorithm for Network Community Discovery and Its Applications to Biological and Social Networks," Proc. Seventh IEEE Int'l Conf. Data Mining, pp. 643-648, 2007.

Index Terms:
Database applications: text mining, natural language processing: language summarization, natural language processing: text analysis.
Citation:
Chien Chin Chen, Meng Chang Chen, "TSCAN: A Content Anatomy Approach to Temporal Topic Summarization," IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 1, pp. 170-183, Jan. 2012, doi:10.1109/TKDE.2010.228
Usage of this product signifies your acceptance of the Terms of Use.