This Article 
 Bibliographic References 
 Add to: 
A Generic System for Form Dropout
November 1996 (vol. 18 no. 11)
pp. 1127-1134

Abstract—Recent advances in intelligent character recognition are enabling us to address many challenging problems in document image analysis. One of them is intelligent form analysis. This paper describes a generic system for form dropout when the filled-in characters or symbols are either touching or crossing the form frames. We propose a method to separate these characters from form frames whose locations are unknown. Since some of the character strokes are either touching or crossing the form frames, we need to address the following three issues: 1) localization of form frames; 2) separation of characters and form frames; and 3) reconstruction of broken strokes introduced during separation. The form frame is automatically located by finding long straight lines based on the block adjacency graph. Form frame separation and character reconstruction are implemented by means of this graph. The proposed system includes form structure learning and form dropout. First, a form structure-based template is automatically generated from a blank form which includes form frames, preprinted data areas and skew angle. With this form template, our system can then extract both handwritten and machine-typed filled-in data. Experimental results on three different types of forms show the performance of our system. Further, the proposed method is robust to noise and skew that is introduced during scanning.

[1] G. Leedham and D. Monger, "Evaluation of an Interactive Tool for Handwritten Form Description," Proc. Third Int'l Conf. Document Analysis and Recognition, pp. 1,185-1,188,Montreal, 1995.
[2] T. Watanabe, Q. Luo, and N. Sugie, “Layout Recognition of Multi-Kinds of Table-Form Documents,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 17, no. 4, pp. 432-445, Apr. 1995.
[3] S.L. Taylor, R. Fritzson, and J.A. Pastor, "Extraction of Data from Preprinted Forms," Machine Vision and Applications, vol. 5, pp. 211-222, 1992.
[4] G. Maderlechner, "'Symbolic Subtraction' from Fixed Formatted Graphics and Text from Filled In Forms," Machine Vision and Applications, vol. 3, pp. 457-459, 1990.
[5] R. Casey, D. Ferguson, K. Mohiuddin, and E. Walach, "Intelligent Forms Processing System," Machine Vision and Applications, vol. 5, pp. 143-155, 1992.
[6] D. Doermann and A. Rosenfeld, "The Interpretation and Reconstruction of Interfering Strokes," Proc. Int'l Workshop Frontiers in Handwriting Recognition, pp. 41-50,Buffalo, N.Y., 1993.
[7] Y.Y. Tang, C.D. Yan, M. Cheriet, and C.Y. Suen, "Financial Document Processing Based on Staff Line and Description Language," IEEE Trans. Systems, Man, and Cybernetics, vol. 25, no. 5, pp. 738-754, 1995.
[8] B. Yu, X. Lin, Y. Wu, and B. Yuan, "Isothetic Polygon Representation for Contours," CVGIP: Image Understanding, vol. 56, pp. 264-268, 1992.
[9] T. Pavlidis, Algorithms for Graphics and Image Processing, pp. 199-201 Rockville, Md.: Computer Science Press, 1982.
[10] B. Yu and A.K. Jain, "A Robust and Fast Skew Detection Algorithm for Generic Documents," Pattern Recognition, to appear.
[11] J. Mao, K.M. Mohiuddin, and T. Fujisaki, "A Two-Stage Multi-Network OCR System with a Soft Pre-Classifier and a Network Selector," Proc. Third Int'l Conf. Document Analysis and Recognition, pp. 78-81,Montreal, 1995.

Index Terms:
Form processing, learning form structure, document image analysis, segmentation, character reconstruction, block adjacency graph.
Bin Yu, Anil K. Jain, "A Generic System for Form Dropout," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 18, no. 11, pp. 1127-1134, Nov. 1996, doi:10.1109/34.544084
Usage of this product signifies your acceptance of the Terms of Use.