This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Layout Recognition of Multi-Kinds of Table-Form Documents
April 1995 (vol. 17 no. 4)
pp. 432-445

Abstract—Many approaches have reported that knowledge-based layout recognition methods are very successful to classify the meaningful data from document images automatically. However, these approaches are applicable to only the same kind of documents because they are based on the paradigm that specifies the structure definition information in advance so as to be able to analyze a particular class of documents intelligently. In this paper, we propose a method to recognize the layout structures of multi-kinds of table-form document images. For this purpose, we introduce a classification tree to manage the relationships among different classes of layout structures. Our recognition system has two modes: layout knowledge acquisition and layout structure recognition. In the layout knowledge acquisition mode, table-form document images are distinguished according to this classification tree and then the structure description trees which specify the logical structures of table-form documents are generated automatically. While, in the layout structure recognition mode, individual item fields in the table-form document images are extracted and classified successfully by searching the classification tree and interpreting the structure description tree.

[1] T. Watanabe,Q. Luo,, and N. Sugie,“A cooperative document understanding method among multiple recognitionprocedures,” Proc. 11th ICPR, pp. 689-692, 1992.
[2] H. Aït-Kaci,Warren’s Abstract Machine, A Tutorial Reconstruction, MIT Press, Series on Logic Programming, Cambridge, Mass., 1991.
[3] T. Watanabe, Q. Luo, and N. Sugie, “Toward a Practical Document Understanding of Table-Form Documents, Its Framework and Knowledge Representation,” Proc. Int'l Conf. Document Analysis and Recognition, pp. 510-515, 1993.
[4] D. Niyogi and S. Srihari,“A rule-based system for document understanding,” Proc. AAAI 86, pp. 789-793.
[5] R. Esposito, D. Malerba, and G. Semeraro, “An Experimental Page Layout Recognition System for Office Document Automatic Classification: An Integrated Approach for Inductive Generalization,” Proc. 10th Int'l Conf. Pattern Recognition (ICPR), pp. 557-562, 1990.
[6] J.L. Fisher, S.C. Hinds, and D.P. D'Amato, “A Rule-Based System for Document Image Segmentation,” Proc. 10th Int'l Conf. Pattern Recognition, pp. 567-572, June 1990.
[7] Y. Nakano,H. Fujisawa,O. Kunusaki,K. Okada,, and T. Hanano,“Understanding of tabular form documents cooperating with characterrecognition,” IECE Trans., vol. J69-D, no. 3, pp. 400-409, 1986 (in Japanese).
[8] J. Higashino,H. Fujisawa,Y. Nakano,, and M. Ejiri,“A knowledge-based segmentation method for document understanding,” Proc. 8th ICPR, pp. 745-748, 1986.
[9] K. Kise,K. Momota,M. Yanaka,J. Sugiyama,N. Babaguchi,, and Y. Tezuka,“Model based understanding of document images,” Proc. MVA 90, pp. 471-474.
[10] A. Dengel and G. Barth,“High level document analysis guided by geometric aspects,” Int’l J. Pattern Recognition and Artificial Intelligence, vol. 2, no. 4, pp. 641-655, 1988.
[11] Q. Luo,T. Watanabe,Y. Yoshida,, and Y. Inagaki,“Recognition of document structure on the basis of spatial and geometricrelationships between document items,” Proc. MVA 90, pp. 461-464.
[12] Q. Luo,T. Watanabe,, and N. Sugie,“A structure recognition method for Japanese newspapers,” Proc. 1st Symp. Document Analysis and Information Retrieval, pp. 217-234, 1992.
[13] T. Watanabe,H. Naruse,Q. Luo,, and N. Sugie,“Structure analysis of table-form documents on the basis of therecognition of vertical and horizontal line segments,” Proc. 1st ICDAR, pp. 638-646, 1991.
[14] T. Watanabe,Q. Luo,, and T. Fukumura,“A framework of layout recognition of document understanding,” Proc. 1st Symp. Document Analysis and Information Retrieval, pp. 77-95, 1992.
[15] T. Watanabe,Q. Luo,, and N. Sugie,“Structure recognition methods for various types of documents,” Int’J. Machine Vision and Applications, 6, pp. 163-176, 1993.
[16] Q. Luo,T. Watanabe,, and N. Sugie,“Structure recognition of table-form documents on the basis of theautomatic acquisition of layout knowledge,” Proc. MVA 92, pp. 79-82.
[17] H. Kojima and T. Akiyama,“Table recognition for automated document entry system,” High-Speed Inspection Architectures, Barcoding, and Character Recognition, SPIE, vol. 1384, pp. 285-292, 1990.
[18] H. Naruse,T. Watanabe,Q. Luo,, and N. Sugie,“A structure recognition method of table-form documents on the basis of theinformation of line segments,” IEICE Trans., vol. J75-D-II, no.8, pp. 1372-1385, 1992 (in Japanese).

Index Terms:
Recognition paradigm for multi-kinds of table-form documents, automatic acquisition of layout knowledge, recognition of document classes, recognition of layout structures, classification tree, structure description tree.
Citation:
Toyohide Watanabe, Qin Luo, Noboru Sugie, "Layout Recognition of Multi-Kinds of Table-Form Documents," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 17, no. 4, pp. 432-445, April 1995, doi:10.1109/34.385976
Usage of this product signifies your acceptance of the Terms of Use.