|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
2011 International Conference on Document Analysis and Recognition
Data Extraction from Web Tables: The Devil is in the Details
Beijing, China
September 18-September 21
ISBN: 978-0-7695-4520-2
| ASCII Text | x | ||
| George Nagy, Sharad Seth, Dongpu Jin, David W. Embley, Spencer Machado, Mukkai Krishnamoorthy, "Data Extraction from Web Tables: The Devil is in the Details," Document Analysis and Recognition, International Conference on, pp. 242-246, 2011 International Conference on Document Analysis and Recognition, 2011. | |||
| BibTex | x | ||
| @article{ 10.1109/ICDAR.2011.57, author = {George Nagy and Sharad Seth and Dongpu Jin and David W. Embley and Spencer Machado and Mukkai Krishnamoorthy}, title = {Data Extraction from Web Tables: The Devil is in the Details}, journal ={Document Analysis and Recognition, International Conference on}, volume = {0}, year = {2011}, issn = {1520-5363}, pages = {242-246}, doi = {http://doi.ieeecomputersociety.org/10.1109/ICDAR.2011.57}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - CONF JO - Document Analysis and Recognition, International Conference on TI - Data Extraction from Web Tables: The Devil is in the Details SN - 1520-5363 SP242 EP246 A1 - George Nagy, A1 - Sharad Seth, A1 - Dongpu Jin, A1 - David W. Embley, A1 - Spencer Machado, A1 - Mukkai Krishnamoorthy, PY - 2011 KW - visual table KW - relational table KW - RDF KW - header-paths VL - 0 JA - Document Analysis and Recognition, International Conference on ER - | |||
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDAR.2011.57
We present a method based on header paths for efficient and complete extraction of labeled data from tables meant for humans. Although many table configurations yield to the proposed syntactic analysis, some require access to semantic knowledge. Clicking on one or two critical cells per table, through a simple interface, is sufficient to resolve most of these problem tables. Header paths, a purely syntactic representation of visual tables, can be transformed ("factored") into existing representations of structured data such as category trees, relational tables, and RDF triples. From a random sample of 200 web tables from ten large statistical web sites, we generated 376 relational tables and 34,110 subject-predicate-object RDF triples.
Index Terms:
visual table, relational table, RDF, header-paths
Citation:
George Nagy, Sharad Seth, Dongpu Jin, David W. Embley, Spencer Machado, Mukkai Krishnamoorthy, "Data Extraction from Web Tables: The Devil is in the Details," icdar, pp.242-246, 2011 International Conference on Document Analysis and Recognition, 2011
Usage of this product signifies your acceptance of the Terms of Use.
