Fourth International Conference Document Analysis and Recognition (ICDAR'97)
Recognition of Facsimile Documents using a Database of Robust Features
Ulm, GERMANY
August 18-August 20
ISBN: 0-8186-7898-4
A method for the recognition of poor quality documents containing touching characters is presented. The method is based on extraction of independent and robust features of each object of a sample word, where objects consist of single letters or of several touching ones. Thus avoiding letter segmentation the method eliminates errors frequently introduced in segmentation based approaches. Feature are attributed by their position and extent in order to facilitate discrimination between different classes of objects. A method for automatic construction of a comprehensive database is presented. From a given dictionary every possible letter combination is obtained and the images of the artificially touching letters created. These images are subjected to noise and their features extracted.. For recognition, alternatives for each object are found based on the database. Object alternatives are then combined into valid word alternatives using lexicon lookup.
Index Terms:
Word recognition, Feature extraction, OCR, Segmentation, Automatic database development
Citation:
G. Raza, A. Hennig, N. Sherkat, R. J. Whitrow, "Recognition of Facsimile Documents using a Database of Robust Features," icdar, pp.444, Fourth International Conference Document Analysis and Recognition (ICDAR'97), 1997