2003 IEEE/WIC International Conference on Web Intelligence (WI'03)
Learning Information Extraction Patterns from Tabular Web Pages without Manual Labelling
Halifax, Canada
October 13-October 17
ISBN: 0-7695-1932-6
This paper describes a domain independent approach to automatically constructing information extraction patterns for semi-structured web pages. The approach was tested on three corpora containing a series of tabular web sites from different domains and achieved a success rate of at least 80%. A signi.cant strength of the system is that it can infer extraction patterns from a single training page and does not require any manual labeling of the training page.
Index Terms:
Machine learning; wrapper; semi-structured data; automatic pattern generation
Citation:
Xiaoying Gao, Mengjie Zhang, Peter Andreae, "Learning Information Extraction Patterns from Tabular Web Pages without Manual Labelling," wi, pp.495, 2003 IEEE/WIC International Conference on Web Intelligence (WI'03), 2003