Ninth International Workshop on Frontiers in Handwriting Recognition (2004)
Kokubunji, Tokyo, Japan
Oct. 26, 2004 to Oct. 29, 2004
ISSN: 1550-5235
ISBN: 0-7695-2187-8
pp: 280-285
Takeshi Nagasaki , Hitachi, Ltd.
Katsumi Marukawa , Hitachi, Ltd.
Toshikazu Takahashi , Hitachi, Ltd.
This paper describes a new document retrieval method that is tolerant of OCR segmentation errors in document images. To overcome the segmentation and recognition errors that most OCR-based retrieval systems suffer from, the proposed method consists of two processing phases. First, the OCR engine first generates multiple character-segmentation and recognition hypotheses. Then the retrieval engine extracts keywords from the recognition hypotheses by using lexicon-driven dynamic programming (DP) matching. We have applied this method to both handwritten and printed document images and have demonstrated its effectiveness in reducing false drops and false alarms.
