Fourth International Conference Document Analysis and Recognition (ICDAR'97) Representing OCRed documents in HTML Ulm, GERMANY August 18-August 20 ISBN: 0-8186-7898-4
OCR is an error-prone process. It is time-consuming and expensive to manually proofread OCR results. The errors remaining in OCRed texts can cause serious problems in reading and understanding if they do not refer to the original image representation. As demonstrated in this paper, a hybrid document which combines symbolic representation and image representation may relieve the problem. If we represent a OCRed document properly in HTML, OCR errors will not have much negative effect on the human reading process in an HTML browser and can be corrected by using an HTML authoring tool. Under this approach, an experiment evaluating a Japanese OCR system developed at CEDAR is also reported in this paper.
Index Terms:
optical character recognition; OCR errors; document representation; HTML browser; text errors; image representation; hybrid document; symbolic representation; human reading process; error correction; HTML authoring tool; Japanese OCR system evaluation
Citation:
Tao Hong, S.N. Srihari, "Representing OCRed documents in HTML," icdar, pp.831, Fourth International Conference Document Analysis and Recognition (ICDAR'97), 1997 Usage of this product signifies your acceptance of the Terms of Use. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||