Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Vol 2
Incremental Learning of First Order Logic Theories for the Automatic Annotations of Web Documents
Curitiba, Parana, Brazil
September 23-September 26
ISBN: 0-7695-2822-8
F. Esposito, Universita degli Studi di Bari-Dipartimento di Informatica
S. Ferilli, Universita degli Studi di Bari-Dipartimento di Informatica
N. Di Mauro, Universita degli Studi di Bari-Dipartimento di Informatica
T. Basile, Universita degli Studi di Bari-Dipartimento di Informatica
Organizing large repositories spread throughout the most diverse Web sites rises the problem of effective storage and efficient retrieval of documents. This can be obtained by selectively extracting from them the significant textual information, contained in peculiar layout components, that in turn depend on the identification of the correct document class. The continuous flow of new and different documents in a weakly structured environment like the Web calls for in- crementality, as the ability to continuously update or revise a faulty knowledge previously acquired, while the need to express structural relations among layout components sug- gest the exploitation of a powerful and symbolic represen- tation language. This paper proposes the application of in- cremental first-order logic learning techniques in the docu- ment layout preprocessing steps, supported by good results obtained in experiments on a real dataset.
Citation:
F. Esposito, S. Ferilli, N. Di Mauro, T. Basile, "Incremental Learning of First Order Logic Theories for the Automatic Annotations of Web Documents," icdar, vol. 2, pp.1093-1097, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Vol 2, 2007