loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Vol 2
Table Recognition and Understanding from PDF Files
Curitiba, Parana, Brazil
September 23-September 26
ISBN: 0-7695-2822-8
T. Hassan, Vienna University of Technology
R. Baumgartner, Vienna University of Technology
We propose a flexible method for detecting and under- standing tables in PDF files, which is not reliant upon one particular feature being present, for example ruling lines or indentations, and is therefore applicable to a wide variety of visual presentations. We describe the steps required in transforming the low-level PDF instructions into text seg- ments, lines and boxes on a page. We propose three different classifications for published tables, and develop methods to detect these tables and correctly identify their respective rows and columns. We also explain how to recognize span- ning rows and columns, and multi-line rows. Experimental results show that our algorithm is effective in converting a wide variety of tabular presentations into HTML for infor- mation extraction purposes.
Citation:
T. Hassan, R. Baumgartner, "Table Recognition and Understanding from PDF Files," icdar, vol. 2, pp.1143-1147, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Vol 2, 2007
Usage of this product signifies your acceptance of the Terms of Use.