loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
2003 IEEE/WIC International Conference on Intelligent Agent Technology (IAT'03)
Learning Knowledge Bases for Information Extraction from Multiple Text Based Web Sites
Halifax, Canada
October 13-October 17
ISBN: 0-7695-1931-8
Xiaoying Gao, Victoria University of Wellington
Mengjie Zhang, Victoria University of Wellington
We describe a learning approach to automatically building knowledge bases for information extraction from multiple text based web pages. A frame based representation is introduced to represent domain knowledge as knowledge unit frames. A frame learning algorithm is developed to automatically learn knowledge unit frames from training examples. Some training examples can be obtained by automatically parsing a number of tabular web pages in the same domain, which greatly reduced the time consuming manual work. This approach was investigated on ten web sites of real estate advertisements and car advertisements and nearly all the information was successfully extracted with very few false alarms. These results suggest that both the knowledge unit frame representation and the frame learning algorithm work well, domain specific knowledge base can be learned from training examples, and the domain specific knowledge base can be used for information extraction from flexible text-based semi-structured Web pages on multiple Web sites.
Index Terms:
Information extraction; learning; knowledge unit frame; text-based web sites; semi-structured data.
Citation:
Xiaoying Gao, Mengjie Zhang, "Learning Knowledge Bases for Information Extraction from Multiple Text Based Web Sites," iat, pp.119, 2003 IEEE/WIC International Conference on Intelligent Agent Technology (IAT'03), 2003
Usage of this product signifies your acceptance of the Terms of Use.