Fourth International Conference Document Analysis and Recognition (ICDAR'97)
Supporting Information Extraction from Printed Documents by Lexico-Semantic Pattern Matching
Ulm, GERMANY
August 18-August 20
ISBN: 0-8186-7898-4
Document analysis and understanding (DAU) systems aim not only at the recognition of text and document structures but also at the extraction of relevant information out of a scanned document. Depending on the class of a document, information to be extracted may be defined in advance in syntactic structures as well as in semantic structures. In this paper we present a system for detecting such information and transforming it into a semantic representation. The basic component is a pattern matcher which incorporates geometric positions to detect phrases in the document. By defining a Levenshtein distance, the component reacts more generously in order to be error- tolerant against OCR failures.