This Article 
 Bibliographic References 
 Add to: 
New Feature Sets for Summarization by Sentence Extraction
July/August 2003 (vol. 18 no. 4)
pp. 34-42
Hans van Halteren, University of Nijmegen

Machine learning feature sets that were originally developed for authorship attribution can be used for summarization by sentence extraction. In the author's pilot experiment, these feature sets distinguished significantly better between extract and nonextract sentences than a random baseline classifier, but it had to be carefully combined with other features to outperform a positional baseline classifier. In the DUC 2002 competition, an actual combination system trained on 400-word single document extracts was one of the best performers in the 200- and 400-word multidocument extraction task. Further experiments showed that this system could be improved significantly with training material that better reflected the intended task.

Index Terms:
summarization, sentence extraction, machine learning, style recognition
Hans van Halteren, "New Feature Sets for Summarization by Sentence Extraction," IEEE Intelligent Systems, vol. 18, no. 4, pp. 34-42, July-Aug. 2003, doi:10.1109/MIS.2003.1217626
Usage of this product signifies your acceptance of the Terms of Use.