loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Seventh International Conference on Document Analysis and Recognition (ICDAR'03) - Volume 1
A Segmentation Method for Bibliographic References by Contextual Tagging of Fields
Edinburgh, Scotland
August 03-August 06
ISBN: 0-7695-1960-1
Dominique Besagni, URI, INIST-CNRS
Abdel Bela?, LORIA-CNRS
Nelly Benet, LORIA-CNRS
In this paper, a method based on part-of-speech tagging (PoS) is used for bibliographic reference structure. This method operates on a roughly structured ASCII file, produced by OCR.. Because of the heterogeneity of the reference structure, the method acts in a bottom-up way, without an a priori model, gathering structural elements from basic tags to sub-fields and fields. Significant tags are first grouped in homogeneous classes according to their grammar categories and then reduced in canonical forms corresponding to record fields: ``authors'', "title", "conference name:, "date", etc. Non labelled tokens are integrated in one or another field by either applying PoS correction rules or using a structure model generated from well-detected records. The designed prototype operates with a great satisfaction on different record layouts and character recognition qualities. Without manual intervention, 96.6% words are correctly attributed, and about 75,9% references are completely segmented from 2500 references.
Citation:
Dominique Besagni, Abdel Bela?, Nelly Benet, "A Segmentation Method for Bibliographic References by Contextual Tagging of Fields," icdar, vol. 1, pp.384, Seventh International Conference on Document Analysis and Recognition (ICDAR'03) - Volume 1, 2003
Usage of this product signifies your acceptance of the Terms of Use.