loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
2008 The Eighth IAPR International Workshop on Document Analysis Systems
Dolores: An??Interactive and Class-Free Approach for Document Logical Restructuring
September 16-September 19
ISBN: 978-0-7695-3337-7
Physical and logical structure recovering from electronic documents is still an open issue. In this paper, we propose a flexible and efficient approach for recovering document structures from PDF files. After a brief introduction of the PDF format and its major features, we report about our evaluation of different existing tools and works for PDF content extraction and analysis. To overcome the weaknesses of these systems, we propose a new analysis strategy, based on an intermediate representation, called XCDF, which enables representing physical structures in a canonical way. This paper then describes the PDF reverse engineering workflow and focuses on the document logical restructuring. Finally, the paper concludes with potential future improvements.
Index Terms:
physical structure, logical structure, document restructuring, pdf reengineering
Citation:
Jean-Luc Bloechle, Catherine Pugin, Rolf Ingold, "Dolores: An??Interactive and Class-Free Approach for Document Logical Restructuring," das, pp.644-652, 2008 The Eighth IAPR International Workshop on Document Analysis Systems, 2008
Usage of this product signifies your acceptance of the Terms of Use.