loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)
Automatically Generating Labeled Examples for Web Wrapper Maintenance
Compi?gne University of Technology, France
September 19-September 22
ISBN: 0-7695-2415-X
Juan Raposo, University of A Coru?
Alberto Pan, University of A Coru?
Manuel ?lvarez, University of A Coru?
Justo Hidalgo, Denodo Technologies Inc.
In order to let software programs gain full benefit from semi-structured web sources, wrapper programs must be built to provide a "machine-readable" view over them. A significant problem of this approach is that, since web sources are autonomous, they may experience changes that invalidate the current wrapper. In this paper, we address this problem by introducing novel heuristics and algorithms for automatically maintaining wrappers. In our approach the system collects some query results during normal wrapper operation and, when the source changes, it uses them as input to generate a set of labeled examples for the source which can then be used to induce a new wrapper. Our experiments show that the proposed techniques show high accuracy for a wide range of real-world web data extraction problems.
Citation:
Juan Raposo, Alberto Pan, Manuel ?lvarez, Justo Hidalgo, "Automatically Generating Labeled Examples for Web Wrapper Maintenance," wi, pp.250-256, 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05), 2005
Usage of this product signifies your acceptance of the Terms of Use.