This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
OLERA: Semisupervised Web-Data Extraction with Visual Support
November/December 2004 (vol. 19 no. 6)
pp. 56-64
Chia-Hui Chang, National Central University, Taiwan
Shih-Chien Kuo, Trend Micro, Taiwan
Extracting information from semistructured Web documents is an important task for many information agents. Over the past few years, researchers have developed an extensive family of generic information extraction techniques based on supervised approaches that learn extraction rules from user-labeled training examples. However, annotating training data can be expensive when thousands of data sources must be wrapped. OLERA, a semisupervised IE system, produces extraction rules without detailed annotation of the training documents. Instead, it gives a rough segment that contains all that need to be extracted in one record as an example. OLERA is designed with visualization support such that it displays the discovered records in a spreadsheet-like table for schema assignment. Experiments show that OLERA performs well for program-generated Web pages with very few training pages and little user intervention.
Index Terms:
semistructured data, Web data extraction, multiple string alignment, rule generalization
Citation:
Chia-Hui Chang, Shih-Chien Kuo, "OLERA: Semisupervised Web-Data Extraction with Visual Support," IEEE Intelligent Systems, vol. 19, no. 6, pp. 56-64, Nov.-Dec. 2004, doi:10.1109/MIS.2004.71
Usage of this product signifies your acceptance of the Terms of Use.