loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Fourth International Conference Document Analysis and Recognition (ICDAR'97)
Extraction of Indicative Summary Sentences from Imaged Documents
Ulm, GERMANY
August 18-August 20
ISBN: 0-8186-7898-4
Francine R. Chen, Xerox Palo Alto Research Center
Dan S. Bloomberg, Xerox Palo Alto Research Center
A system for selecting sentences from an imaged document for presentation as part of a document summary is presented. The extracts are identified without the use of optical character recognition. The sentences are selected based on a set of discrete features characterizing the words within a sentence and the location of the sentence within the imaged document. Each sentence is scored based on the values of the discrete features using a statistically based classifier. The imaged document is processed to identify the word locations, the reading order of words, and the location of sentence and paragraph boundaries in the text. The words are grouped into equivalence classes to mimic the terms in a text document. A sample extract for a technical document is shown, and evaluation against a set of abstracts created by a professional abstracting company is given. These results are compared with text-based abstracts.
Index Terms:
document summarization, document retrieval, image interpretation, image analysis, sentence extraction, keyword extraction, pattern classification
Citation:
Francine R. Chen, Dan S. Bloomberg, "Extraction of Indicative Summary Sentences from Imaged Documents," icdar, pp.227, Fourth International Conference Document Analysis and Recognition (ICDAR'97), 1997
Usage of this product signifies your acceptance of the Terms of Use.