loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
2004 IEEE/WIC/ACM International Conference on Web Intelligence (WI'04)
Efficient Wrapper Reinduction from Dynamic Web Sources
Beijing, China
September 20-September 24
ISBN: 0-7695-2100-2
Roshni Mohapatra, Institute for Infocomm Research, Singapore
Kanagasabai Rajaraman, Institute for Infocomm Research, Singapore
Sung Sam Yuan, National University of Singapore
This paper investigates wrapper induction from web sites whose layout may change over time. We formulate the reinduction as an incremental learning problem and identify that wrapper induction from an incomplete label is a key problem to be solved. We propose a novel algorithm for incrementally inducing LR wrappers and show that this algorithm asymptotically identifies the correct wrapper as the number of tuples is increased. This property is used to propose a LR wrapper reinduction algorithm. This algorithm requires examples to be provided exactly once and there-after the algorithm can detect the layout changes and reinduce wrappers automatically. In experimental studies, we observe that the reinduction algorithm is able to achieve near perfect performance.
Citation:
Roshni Mohapatra, Kanagasabai Rajaraman, Sung Sam Yuan, "Efficient Wrapper Reinduction from Dynamic Web Sources," wi, pp.391-397, 2004 IEEE/WIC/ACM International Conference on Web Intelligence (WI'04), 2004
Usage of this product signifies your acceptance of the Terms of Use.