String Processing and Information Retrieval, International Symposium on (1999)
Sept. 21, 1999 to Sept. 24, 1999
Berthier Ribeiro-Neto , Federal University of Minas Gerais
Alberto H. F. Laender , Federal University of Minas Gerais
Altigran S. da Silva , Federal University of Minas Gerais
In this paper, we propose an innovative approach to extracting semi-structured data from Web sources. The idea is to collect a couple of example objects from the user and to use this information to extract new objects from new pages or texts. We propose a top-down strategy that extracts complex objects decomposing them in objects less complex, until atomic objects have been extracted. Through experimentation, we demonstrate that with a small number of given examples our strategy is able to extract most of the objects present in a Web source given as input.
Semi-Structured Data, Data Extraction
A. H. Laender, B. Ribeiro-Neto and A. S. Silva, "Top-Down Extraction of Semi-Structured Data," String Processing and Information Retrieval, International Symposium on(SPIRE), Cancun, Mexico, 1999, pp. 176.