The Community for Technology Leaders
String Processing and Information Retrieval, International Symposium on (1999)
Cancun, Mexico
Sept. 21, 1999 to Sept. 24, 1999
ISBN: 0-7695-0268-7
pp: 176
Berthier Ribeiro-Neto , Federal University of Minas Gerais
Alberto H. F. Laender , Federal University of Minas Gerais
Altigran S. da Silva , Federal University of Minas Gerais
ABSTRACT
In this paper, we propose an innovative approach to extracting semi-structured data from Web sources. The idea is to collect a couple of example objects from the user and to use this information to extract new objects from new pages or texts. We propose a top-down strategy that extracts complex objects decomposing them in objects less complex, until atomic objects have been extracted. Through experimentation, we demonstrate that with a small number of given examples our strategy is able to extract most of the objects present in a Web source given as input.
INDEX TERMS
Semi-Structured Data, Data Extraction
CITATION

A. H. Laender, B. Ribeiro-Neto and A. S. Silva, "Top-Down Extraction of Semi-Structured Data," String Processing and Information Retrieval, International Symposium on(SPIRE), Cancun, Mexico, 1999, pp. 176.
doi:10.1109/SPIRE.1999.796593
96 ms
(Ver 3.3 (11022016))