loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Seventh International Conference on Document Analysis and Recognition (ICDAR'03) - Volume 2
Lexical Postcorrection of OCR-Results: The Web as a Dynamic Secondary Dictionary?
Edinburgh, Scotland
August 03-August 06
ISBN: 0-7695-1960-1
Christian M. Strohmaier, SfS - Univ. of T?bingen
Christoph Ringlstetter, CIS - Univ. of Munich
Klaus U. Schulz, CIS - Univ. of Munich
Stoyan Mihov, LPDP - Bulgarian Academy of Sciences
Postcorrection of OCR-results for text documents is usually based on electronic dictionaries. When scanning texts from a specific thematic area, conventional dictionaries often miss a considerable number of tokens. Furthermore, if word frequencies are stored with the entries, these frequencies will not properly reflect the frequencies found in the given thematic area. Correction adequacy suffers from these two shortcomings. We report on a series of experiments where we compare (1) the use of fixed, static large-scale dictionaries (including proper names and abbreviations) with (2) the use of dynamic dictionaries retrieved via an automated analysis of the vocabulary of web pages from a given domain, and (3) the use of mixed dictionaries. Our experiments, which address English and German document collections from a variety of fields, show that dynamic dictionaries of the above mentioned form can improve the coverage for the given thematic area in a significant way and help to improve the quality of lexical postcorrection methods.
Citation:
Christian M. Strohmaier, Christoph Ringlstetter, Klaus U. Schulz, Stoyan Mihov, "Lexical Postcorrection of OCR-Results: The Web as a Dynamic Secondary Dictionary?," icdar, vol. 2, pp.1133, Seventh International Conference on Document Analysis and Recognition (ICDAR'03) - Volume 2, 2003
Usage of this product signifies your acceptance of the Terms of Use.