|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
OLERA: Semisupervised Web-Data Extraction with Visual Support
November/December 2004 (vol. 19 no. 6)
pp. 56-64
| ASCII Text | x | ||
| Chia-Hui Chang, Shih-Chien Kuo, "OLERA: Semisupervised Web-Data Extraction with Visual Support," IEEE Intelligent Systems, vol. 19, no. 6, pp. 56-64, November/December, 2004. | |||
| BibTex | x | ||
| @article{ 10.1109/MIS.2004.71, author = {Chia-Hui Chang and Shih-Chien Kuo}, title = {OLERA: Semisupervised Web-Data Extraction with Visual Support}, journal ={IEEE Intelligent Systems}, volume = {19}, number = {6}, issn = {1541-1672}, year = {2004}, pages = {56-64}, doi = {http://doi.ieeecomputersociety.org/10.1109/MIS.2004.71}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - MGZN JO - IEEE Intelligent Systems TI - OLERA: Semisupervised Web-Data Extraction with Visual Support IS - 6 SN - 1541-1672 SP56 EP64 EPD - 56-64 A1 - Chia-Hui Chang, A1 - Shih-Chien Kuo, PY - 2004 KW - semistructured data KW - Web data extraction KW - multiple string alignment KW - rule generalization VL - 19 JA - IEEE Intelligent Systems ER - | |||
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/MIS.2004.71
Extracting information from semistructured Web documents is an important task for many information agents. Over the past few years, researchers have developed an extensive family of generic information extraction techniques based on supervised approaches that learn extraction rules from user-labeled training examples. However, annotating training data can be expensive when thousands of data sources must be wrapped. OLERA, a semisupervised IE system, produces extraction rules without detailed annotation of the training documents. Instead, it gives a rough segment that contains all that need to be extracted in one record as an example. OLERA is designed with visualization support such that it displays the discovered records in a spreadsheet-like table for schema assignment. Experiments show that OLERA performs well for program-generated Web pages with very few training pages and little user intervention.
Index Terms:
semistructured data, Web data extraction, multiple string alignment, rule generalization
Citation:
Chia-Hui Chang, Shih-Chien Kuo, "OLERA: Semisupervised Web-Data Extraction with Visual Support," IEEE Intelligent Systems, vol. 19, no. 6, pp. 56-64, Nov.-Dec. 2004, doi:10.1109/MIS.2004.71
Usage of this product signifies your acceptance of the Terms of Use.

