loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Eighth International Conference on Document Analysis and Recognition (ICDAR'05)
Table Structure Analysis Based on Cell Classification and Cell Modification for XML Document Transformation
Seoul, Korea
August 31-September 01
ISBN: 0-7695-2420-6
Yasuto ISHITANI, Toshiba Corporation, Japana
Kosei FUME, Toshiba Corporation, Japana
Kazuo SUMITA, Toshiba Corporation, Japana
A new method of table structure analysis based on cell classifi- cation and cell modification is proposed in this paper as the basis of an OCR which can convert a variety of printed tables into XML documents in accordance with a specified XML schema. The outline of this method is described as follows. Firstly, cell features de- fined by ruled lines, which correspond to data fields, are extracted from the input image of a table. After that, each cell is classified to identify the irregular table whose ruled lines are not gridded and is modified to form regular cell arrangement. Next, the hierarchical table structure consisting of a regular row structure of cells is extracted from the modified regular table and is described using a DOM tree. In this case, logical objects within a cell are extracted and are converted into a sub-tree in the DOM tree. Finally, this DOM tree is transformed into a target XML document by an XML parser with information extraction process. Experimental results show the method is effective in transforming various printed tables to various XML documents.
Citation:
Yasuto ISHITANI, Kosei FUME, Kazuo SUMITA, "Table Structure Analysis Based on Cell Classification and Cell Modification for XML Document Transformation," icdar, pp.1247-1252, Eighth International Conference on Document Analysis and Recognition (ICDAR'05), 2005
Usage of this product signifies your acceptance of the Terms of Use.