loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
2003 IEEE/WIC International Conference on Web Intelligence (WI'03)
Learning Information Extraction Patterns from Tabular Web Pages without Manual Labelling
Halifax, Canada
October 13-October 17
ISBN: 0-7695-1932-6
Xiaoying Gao, Victoria University of Wellington
Mengjie Zhang, Victoria University of Wellington
Peter Andreae, Victoria University of Wellington
This paper describes a domain independent approach to automatically constructing information extraction patterns for semi-structured web pages. The approach was tested on three corpora containing a series of tabular web sites from different domains and achieved a success rate of at least 80%. A signi.cant strength of the system is that it can infer extraction patterns from a single training page and does not require any manual labeling of the training page.
Index Terms:
Machine learning; wrapper; semi-structured data; automatic pattern generation
Citation:
Xiaoying Gao, Mengjie Zhang, Peter Andreae, "Learning Information Extraction Patterns from Tabular Web Pages without Manual Labelling," wi, pp.495, 2003 IEEE/WIC International Conference on Web Intelligence (WI'03), 2003
Usage of this product signifies your acceptance of the Terms of Use.