|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Vol 2
Content-level Annotation of Large Collection of Printed Document Images
Curitiba, Parana, Brazil
September 23-September 26
ISBN: 0-7695-2822-8
| ASCII Text | x | ||
| A. Kumar, C.V. Jawahar, "Content-level Annotation of Large Collection of Printed Document Images," Document Analysis and Recognition, International Conference on, vol. 2, pp. 799-803, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Vol 2, 2007. | |||
| BibTex | x | ||
| @article{ 10.1109/ICDAR.2007.89, author = {A. Kumar and C.V. Jawahar}, title = {Content-level Annotation of Large Collection of Printed Document Images}, journal ={Document Analysis and Recognition, International Conference on}, volume = {2}, year = {2007}, issn = {1520-5363}, pages = {799-803}, doi = {http://doi.ieeecomputersociety.org/10.1109/ICDAR.2007.89}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - CONF JO - Document Analysis and Recognition, International Conference on TI - Content-level Annotation of Large Collection of Printed Document Images SN - 1520-5363 SP799 EP803 A1 - A. Kumar, A1 - C.V. Jawahar, PY - 2007 VL - 2 JA - Document Analysis and Recognition, International Conference on ER - | |||
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDAR.2007.89
A large annotated corpus is critical to the development of robust optical character recognizers (OCRs). However, creation of annotated corpora is a tedious task. It is la- borious, especially when the annotation is at the character level. In this paper, we propose an efficient hierarchical approach for annotation of large collection of printed doc- ument images. We align document images with indepen- dently keyed-in text. The method is model-driven and is in- tended to annotate large collection of documents, scanned in three different resolutions, at character level. We employ an XML representation for storage of the annotation infor- mation. APIs are provided for access at content level for easy use in training and evaluation of OCRs and other doc- ument understanding tasks.
Citation:
A. Kumar, C.V. Jawahar, "Content-level Annotation of Large Collection of Printed Document Images," icdar, vol. 2, pp.799-803, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Vol 2, 2007
Usage of this product signifies your acceptance of the Terms of Use.
