loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
First International Workshop on Document Image Analysis for Libraries (DIAL'04)
Document Style Census for OCR
Palo Alto, California
January 23-January 24
ISBN: 0-7695-2088-X
George Nagy, Rensselaer Polytechnic Institute
Prateek Sarkar, Palo Alto Research Center
Four methods of converting paper documents to computer-readable form are compared with regard to hypothetical labor cost: keyboarding, omnifont OCR, style-specific OCR, and style-constrained or style-adaptive OCR. The best choice is determined primarily by (1) the reject rates of the various OCR systems at a given error rate, (2) the fraction of the material that must be labeled for training the system, and (3) the cost of partitioning the material according to style. For large corpora, sampling strategies are proposed both for estimating conversion costs and for taking advantage of style homogeneity.
Citation:
George Nagy, Prateek Sarkar, "Document Style Census for OCR," dial, pp.134, First International Workshop on Document Image Analysis for Libraries (DIAL'04), 2004
Usage of this product signifies your acceptance of the Terms of Use.