First International Workshop on Document Image Analysis for Libraries (DIAL'04)
Machine Learning Methods for Automatically Processing Historical Documents: From Paper Acquisition to XML Transformation
Palo Alto, California
January 23-January 24
ISBN: 0-7695-2088-X
One of the aims of the EU project COLLATE is to design and implement a Web-based collaboratory for archives, scientists and end-users working with digitized cultural material. Since the originals of such a material are often unique and scattered in various archives, severe problems arise for their wide fruition. A solution would be to develop intelligent document processing tools that automatically transform printed documents into a web-accessible form such as XML. Here, we propose the use of a document processing system, WISDOM++, which uses heavily machine learning techniques in order to perform such a task, and report promising results obtained in preliminary experiments.
Citation:
F. Esposito, D. Malerba, G. Semeraro, S. Ferilli, O. Altamura, T. M. A. Basile, M. Berardi, M. Ceci, N. Di Mauro, "Machine Learning Methods for Automatically Processing Historical Documents: From Paper Acquisition to XML Transformation," dial, pp.328, First International Workshop on Document Image Analysis for Libraries (DIAL'04), 2004