loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Vol 2
Layout Based Information Extraction from HTML Documents
Curitiba, Parana, Brazil
September 23-September 26
ISBN: 0-7695-2822-8
R. Burget, Brno University of Technology
We propose a method of information extraction from HTML documents based on modelling the visual informa- tion in the document. A page segmentation algorithm is used for detecting the document layout and subsequently, the extraction process is based on the analysis of mutual po- sitions of the detected blocks and their visual features. This approach is more robust that the traditional DOM-based methods and it opens new possibilities for the extraction task specification.
Citation:
R. Burget, "Layout Based Information Extraction from HTML Documents," icdar, vol. 2, pp.624-628, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Vol 2, 2007
Usage of this product signifies your acceptance of the Terms of Use.