loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Eighth International Conference on Document Analysis and Recognition (ICDAR'05)
A Model for Detecting and Merging Vertically Spanned Table Cells in Plain Text Documents
Seoul, Korea
August 31-September 01
ISBN: 0-7695-2420-6
Vanessa Long, Macquarie University Sydney, Australia
Robert Dale, Macquarie University Sydney, Australia
Steve Cassidy, Macquarie University Sydney, Australia
A spanned cell in a table is a single, complete unit that physically occupies multiple columns and/or multiple rows. Spanned cells are common in tables, and they are a signifi- cant cause of error in the extraction of tables from free text documents. In this paper, we present a model for the detection and merging of vertically spanned cells for tables presented in plain text documents. Our model and algorithm are based purely on the layout features of the tables, and they require no semantic understanding of the documents. When tested on the 98 tables appearing in 40 randomly selected documents from a corpus of company announcements from the Australian Stock Exchange (ASX), our algorithm achieves an accuracy of 86.79% in detecting and merging vertically spanned cells.
Citation:
Vanessa Long, Robert Dale, Steve Cassidy, "A Model for Detecting and Merging Vertically Spanned Table Cells in Plain Text Documents," icdar, pp.1242-1246, Eighth International Conference on Document Analysis and Recognition (ICDAR'05), 2005
Usage of this product signifies your acceptance of the Terms of Use.