Fourth International Conference Document Analysis and Recognition (ICDAR'97)
Form Processing based on Background Region Analysis
Ulm, GERMANY
August 18-August 20
ISBN: 0-8186-7898-4
We present a novel approach for processing form documents based on background region analysis. Our goal is to achieve line-property-free form processing. Background regions can be extracted independently of line width or length, and multi-layer analysis employing a series of coarse-to-fine background images makes it possible to extract background regions regardless of small line-breaks. We propose two multi-layer analysis algorithms for different situations. One is applied in a registration process of a form model. It reliably extracts box regions from un-filled forms without using any model. The other is applied in a character extraction process. By using a spatial model of a form, it reliably extracts background regions, and re-integrates these regions if they are divided by characters written in the boxes. From these re-integrated regions, the exact locations of the character boxes are determined on the input image. Besides these algorithms, we present a form identification method that uses coarse background images. We implemented the algorithms into a prototype system that processes pre-printed forms. 50 types of existing forms were tested without any customization. Model registration, character extraction, and form identification were reliably carried out.