Issue No. 02 - February (1996 vol. 18)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/34.481536
<p><b>Abstract</b>—In this paper, we consider the problem of evaluating character image generators that model distortions encountered in optical character recognition (OCR). While a number of such defect models have been proposed, the contention that they produce the desired result is typically argued in an ad hoc and informal way. We introduce a rigorous and more pragmatic definition of when a model is accurate: we say a defect model is validated if the OCR errors induced by the model are indistinguishable from the errors encountered when using real scanned documents. We describe four measures to quantify this similarity, and compare and contrast them using over ten million scanned and synthesized characters in three fonts. The measures differentiate effectively between different fonts and different scans of the same font regardless of the underlying text.</p>
Optical character recognition, document image defect models, OCR error classification, defect model validation.
Y. Li, G. Nagy, D. Lopresti and A. Tomkins, "Validation of Image Defect Models for Optical Character Recognition," in IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 18, no. , pp. 99-108, 1996.