loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Fourth International Conference Document Analysis and Recognition (ICDAR'97)
Using Character Shape Coding for Information Retrieval
Ulm, GERMANY
August 18-August 20
ISBN: 0-8186-7898-4
A.F. Smeaton, Dublin City University
A.L. Spitz, Daimler Benz Research & Technology Center
In conventional information retrieval the task of finding users' search terms in a document is simple. When the document is not available in machine-readable format, optical character recognition (OCR) can usually be performed. We have developed a technique for performing information retrieval on document images in such a manner that the accuracy has great utility. The method makes generalisations about the images of characters, then performs classification of these and agglomerates the resulting character shape codes into word tokens based on character shape coding. These are sufficiently specific in their representation of the underlying words to allow reasonable performance of retrieval. Using a collection of over 250 Mbytes of document texts and queries with known relevance assessments, we present a series of experiments to determine how various parameters in the retrieval strategy affect retrieval performance and we obtain a surprisingly good results.
Citation:
A.F. Smeaton, A.L. Spitz, "Using Character Shape Coding for Information Retrieval," icdar, pp.974, Fourth International Conference Document Analysis and Recognition (ICDAR'97), 1997
Usage of this product signifies your acceptance of the Terms of Use.