Eighth International Conference on Document Analysis and Recognition (ICDAR'05)
Semantics-Based Content Extraction in Typewritten Historical Documents
Seoul, Korea
August 31-September 01
ISBN: 0-7695-2420-6
This paper presents a flexible approach to extracting content from scanned historical documents using semantic information. The final electronic document is the result of a "digital historical document lifecycle" process, where the expert knowledge of the historian/archivist user is incorporated at different stages. Results show that such a conversion strategy aided by (expert) user-specified semantic information and which enables the processing of individual parts of the document in a specialised way, produces superior (in a variety of significant ways) results than document analysis and understanding techniques devised for contemporary documents.
Citation:
A. Antonacopoulos, D. Karatzas, "Semantics-Based Content Extraction in Typewritten Historical Documents," icdar, pp.48-53, Eighth International Conference on Document Analysis and Recognition (ICDAR'05), 2005