2013 IEEE 13th International Conference on Data Mining (2008)

Dec. 15, 2008 to Dec. 19, 2008

ISSN: 1550-4786

ISBN: 978-0-7695-3502-9

pp: 713-718

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDM.2008.55

ABSTRACT

Multi-Document Summarization deals with computing a summary for a set of related articles such that they give the user a general view about the events. One of the objectives is that the sentences should cover the different events in the documents with the information covered in as few sentences as possible. Latent Dirichlet Allocation can breakdown these documents into different topics or events. However to reduce the common information content the sentences of the summary need to be orthogonal to each other since orthogonal vectors have the lowest possible similarity and correlation between them. Singular Value Decompositions used to get the orthogonal representations of vectors and representing sentences as vectors, we can get the sentences that are orthogonal to each other in the LDA mixture model weighted term domain. Thus using LDA we find the different topics in the documents and using SVD we find the sentences that best represent these topics. Finally we present the evaluation of the algorithms on the DUC2002 Corpus multi-document summarization tasks using the ROUGE evaluator to evaluate the summaries. Compared to DUC 2002 winners, our algorithms gave significantly better ROUGE-1 recall measures.

INDEX TERMS

Natural Language Processing, Multi-Document Summarization

CITATION

Rachit Arora,
Balaraman Ravindran,
"Latent Dirichlet Allocation and Singular Value Decomposition Based Multi-document Summarization",

*2013 IEEE 13th International Conference on Data Mining*, vol. 00, no. , pp. 713-718, 2008, doi:10.1109/ICDM.2008.55