Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Vol 1 Fast Selection of Small and Precise Candidate Sets from Dictionaries for Text Correction Tasks Curitiba, Parana, Brazil September 23-September 26 ISBN: 0-7695-2822-8
Lexical text correction relies on a central step where ap- proximate search in a dictionary is used to select the best correction suggestions for an ill-formed input token. In pre- vious work we introduced the concept of a universal Lev- enshtein automaton and showed how to use these automata for efficiently selecting from a dictionary all entries within a fixed Levenshtein distance to the garbled input word. In this paper we look at refinements of the basic Levenshtein distance that yield more sensible notions of similarity in distinct text correction applications, e.g. OCR. We show that the concept of a universal Levenshtein automaton can be adapted to these refinements. In this way we obtain a method for selecting correction candidates which is very ef- ficient, at the same time selecting small candidate sets with high recall.
Citation:
K. Schulz, S. Mihov, P. Mitankin, "Fast Selection of Small and Precise Candidate Sets from Dictionaries for Text Correction Tasks," icdar, vol. 1, pp.471-475, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Vol 1, 2007 Usage of this product signifies your acceptance of the Terms of Use. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||