loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Seventh International Conference on Document Analysis and Recognition (ICDAR'03) - Volume 1
Indexing and retrieval of words in old documents
Edinburgh, Scotland
August 03-August 06
ISBN: 0-7695-1960-1
Simone Marinai, DSI - University of Florence
Emanuele Marino, DSI - University of Florence
Giovanni Soda, DSI - University of Florence
This paper describes a system for efficient indexing and retrieval of words in collections of documen images. The proposed method is based on two main principles: unsupervised prototype clustering, and string encoding for efficient string matching. During indexing, a self organizing map (SOM) is trained so as to cluster together similar symbols (character-like objects) in a sub-set of the documents to be stored. By using the trained SOM the words in the whole collection can be stored and represented with a fixed-length description, that can be easily compared in order to score most similar words in response to a user query.
The system can be automatically adapted to different languages and fon styles. The most appropriate applications are for the processing of old documents (18th and 19th Centuries) where current OCRs have more difficulties. Experimental results describe three application scenarios having various levels of difficulty for current OCR systems.
Citation:
Simone Marinai, Emanuele Marino, Giovanni Soda, "Indexing and retrieval of words in old documents," icdar, vol. 1, pp.223, Seventh International Conference on Document Analysis and Recognition (ICDAR'03) - Volume 1, 2003
Usage of this product signifies your acceptance of the Terms of Use.