This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Vol 2
The hOCR Microformat for OCR Workflow and Results
Curitiba, Parana, Brazil
September 23-September 26
ISBN: 0-7695-2822-8
T. Breuel, U. Kaiserslautern and DFKI, Germany
Large scale scanning and document conversion ef- forts have led to a renewed interest in OCR systems and workflows. This paper describes a new format for representing both intermediate and final OCR results, developed in response to the needs of a newly developed OCR system and ground truth data release. The format embeds OCR information invisibly inside the HTML and CSS standards and therefore can represent a wide range of linguistic and typographic phenomena with al- ready well-defined, widely understood markup and can be processed using widely available and known tools. The format is based on a new, multi-level abstraction of OCR results based on logical markup, common typeset- ting models, and OCR engine-specific markup, making it suitable both for the support of existing workflows and the development of future model-based OCR engines.
Citation:
T. Breuel, "The hOCR Microformat for OCR Workflow and Results," icdar, vol. 2, pp.1063-1067, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Vol 2, 2007
Usage of this product signifies your acceptance of the Terms of Use.