Sixth International Conference on Document Analysis and Recognition (ICDAR'01) Benchmarking Commercial OCR Engines for Technical Drawings Indexing Seattle, Washington September 10-September 13 ISBN: 0-7695-1263-1
Abstract: The choice of a commercial Optical Character Recognition (OCR) engine is important for the process of automatically indexing technical drawings from their title blocks. We would like to benchmark commercial OCR engines with respect to their inclusion in the global digitalisation chain from scanning to understanding the text information contained in a technical drawing document. The crucial (costly) point is the manual correction of OCR recognition errors. By benchmarking, we intend to identify, for our application domain, the causes for OCR errors which are the most costly to correct. For a given OCR engine, we model the correction cost as a function of image characteristics. Thus, our methodology relies on the two following issues: On the one hand, the design of the correction cost, representing the difficulty of correction for a human operator. On the other hand, the classification of image characteristics that may lead to OCR recognition errors. We choose to analyse the behaviour of this correction cost by Principal Component Analysis (PCA), comparing two by two the engines to discover their complementarity.
Citation:
J.C. Lecoq, L. Najman, O. Gibot, E. Trupin, "Benchmarking Commercial OCR Engines for Technical Drawings Indexing," icdar, pp.0138, Sixth International Conference on Document Analysis and Recognition (ICDAR'01), 2001 Usage of this product signifies your acceptance of the Terms of Use. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||