loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
22nd International Conference on Data Engineering (ICDE'06)
Segmentation of Publication Records of Authors from the Web
Atlanta, Georgia
April 03-April 07
ISBN: 0-7695-2570-9
Wei Zhang, University of Illinois at Chicago
Clement Yu, University of Illinois at Chicago
Neil Smalheiser, University of Illinois at Chicago
Vetle Torvik, University of Illinois at Chicago
Publication records are often found in the authors? personal home pages. If such a record is partitioned into a list of semantic fields of authors, title, date, etc., the unstructured texts can be converted into structured data, which can be used in other applications.

In this paper, we present PEPURS, a publication record segmentation system. It adopts a novel "Split and Merge" strategy. A publication record is split into segments; multiple statistical classifiers compute their likelihoods of belonging to different fields; finally adjacent segments are merged if they belong to the same field. PEPURS introduces the punctuation marks and their neighboring texts as a new feature to distinguish different roles of the marks. PEPURS yields high accuracy scores in experiments.

Citation:
Wei Zhang, Clement Yu, Neil Smalheiser, Vetle Torvik, "Segmentation of Publication Records of Authors from the Web," icde, pp.120, 22nd International Conference on Data Engineering (ICDE'06), 2006
Usage of this product signifies your acceptance of the Terms of Use.