May 13, 1996 to May 15, 1996
In this paper, we examine the effects of simulated OCR errors on Boolean query models for information retrieval. We show that even relatively small amounts of such noise can have a significant impact. To address this issue, we formulate new variants of the traditional models by combining two classic paradigms for dealing with imprecise data: approximate string matching and fuzzy logic. Using a recall/precision analysis of an experiment involving nearly 60 million query evaluations, we demonstrate that the new fuzzy retrieval methods are generally more robust than their "sharp" counterparts.
Dan Lopresti, "Robust Retrieval of Noisy Text", ADL, 1996, Advances in Digital Libraries Conference, IEEE, Advances in Digital Libraries Conference, IEEE 1996, pp. 76, doi:10.1109/ADL.1996.502518