2014 Sixth International Symposium on Parallel Architectures, Algorithms and Programming (PAAP) (2014)
July 13, 2014 to July 15, 2014
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/PAAP.2014.22
Latent Dirichlet Allocation (LDA), has beenrecently used to automatically generate text corpora topics, and applied to sentences extraction based multi-documentsummarization algorithms. However, not all the estimatedtopics are of equal importance or correspond to genuinethemes of the domain. Some of the topics can be a collection ofirrelevant or background words, or represent insignificantthemes. This paper proposed a topic-sensitive algorithm formulti-document summarization. Our approach is distinguishedfrom existing approaches in that we use LDA model to identifyand distinguish significance topic which is used in sentenceweight calculation. Moreover, beside topic characteristics, thisapproach also considered some statistics characteristics, suchas term frequency, sentence position, sentence length, etc. Thisapproach not only highlights the advantages of statisticscharacteristics, but also cooperated with LDA topic model. Theexperiments showed that the proposed algorithm achievedbetter performance compared the other state-of-the-artalgorithms on DUC2002 corpus.
Computational modeling, Resource management, Probabilistic logic, Probability distribution, Frequency measurement, Length measurement, Bayes methods
N. Liu, X. Tang, Y. Lu, M. Li, H. Wang and P. Xiao, "Topic-Sensitive Multi-document Summarization Algorithm," 2014 Sixth International Symposium on Parallel Architectures, Algorithms and Programming (PAAP), Beijing, China, 2014, pp. 69-74.