loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
15th International Conference on Pattern Recognition (ICPR'00) - Volume 4
Part-of-Speech Tagging for Table of Contents Recognition
Barcelona, Spain
September 03-September 08
ISBN: 0-7695-0750-6
A. Belaïd, LORIA-CNRS
L. Pierron, LORIA-INRIA
N. Valverde, LORIA-ITESOFT
A labeling approach to automatic recognition of tables of contents (ToCs) is described. A prototype is used for consulting electronically scientific papers in a digital library system named Calliope. This method operates on a roughly structured ASCII file, produced with OCR. Labeling is based on a part of speech (POS) tagging. Tagging is initiated by a primary labeling of text component using some specific dictionaries. Significant tags are then grouped in title and author strings and reduced in canonical forms according to contextual rules. Non-labeled tokens are integrated in one or another field per either applying contextual correction rules or using a structure model generated from well-detected articles. The designed prototype operates with a great satisfaction on different TOC layouts and character recognition qualities. Without manual intervention, 95.41% rate of correct segmentation was obtained on 38 journals including 2703 articles and 81.74% rate of correct field extraction.
Citation:
A. Belaïd, L. Pierron, N. Valverde, "Part-of-Speech Tagging for Table of Contents Recognition," icpr, vol. 4, pp.4451, 15th International Conference on Pattern Recognition (ICPR'00) - Volume 4, 2000
Usage of this product signifies your acceptance of the Terms of Use.