loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Eighth International Conference on Document Analysis and Recognition (ICDAR'05)
Identification of Document Structure and Table of Content in Magazine Archives
Seoul, Korea
August 31-September 01
ISBN: 0-7695-2420-6
Sherif Yacoub, HP Labs, Spain
Jose Abad Peiro, Hewlett-Packard, Spain
In this paper, we present a generic approach for reliable identification of the table of content (TOC) pages in scanned documents. We use multiple sources of information to obtain a reliable assessment of the TOC pages and the position of articles. These sources are produced by using three methods: title matching, section keyword matching, and numeric content. Finally a combination component is used to score potential TOC pages and select the best candidates. The system is used to identify the table of content, locate the beginning of articles, aid the process of advertisement identification (where present), and in general, identify the structure of scanned documents for the process of article extraction and online deployment of digital content. Results of applying the algorithms to an 80-years archive of Time weekly magazine are presented.
Citation:
Sherif Yacoub, Jose Abad Peiro, "Identification of Document Structure and Table of Content in Magazine Archives," icdar, pp.1253-1259, Eighth International Conference on Document Analysis and Recognition (ICDAR'05), 2005
Usage of this product signifies your acceptance of the Terms of Use.