loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Fourth International Conference Document Analysis and Recognition (ICDAR'97)
The Detection of Duplicates in Document Image Databases
Ulm, GERMANY
August 18-August 20
ISBN: 0-8186-7898-4
D. Doermann, Institute for Advanced Computer Studies, University of Maryland, College Park
H. Li, Institute for Advanced Computer Studies, University of Maryland, College Park
O. Kia, Institute for Advanced Computer Studies, University of Maryland, College Park
In this paper we propose and implement a method for detecting duplicate documents in very large image databases. The method is based on a robust "signature" extracted from each document image which is used to index into a table of previously processed documents. The approach has a number of advantages over OCR or other recognition based methods including speed and robustness to imaging distortions. To justify the approach and test the scalability, we have developed a simulator which allows us to change parameters of the system and examine performance for millions of document signatures. A complete system is implemented and tested on a test collection of technical articles and memos.
Citation:
D. Doermann, H. Li, O. Kia, "The Detection of Duplicates in Document Image Databases," icdar, pp.314, Fourth International Conference Document Analysis and Recognition (ICDAR'97), 1997
Usage of this product signifies your acceptance of the Terms of Use.