Fourth International Conference Document Analysis and Recognition (ICDAR'97) The Detection of Duplicates in Document Image Databases Ulm, GERMANY August 18-August 20 ISBN: 0-8186-7898-4
In this paper we propose and implement a method for detecting duplicate documents in very large image databases. The method is based on a robust "signature" extracted from each document image which is used to index into a table of previously processed documents. The approach has a number of advantages over OCR or other recognition based methods including speed and robustness to imaging distortions. To justify the approach and test the scalability, we have developed a simulator which allows us to change parameters of the system and examine performance for millions of document signatures. A complete system is implemented and tested on a test collection of technical articles and memos.
Citation:
D. Doermann, H. Li, O. Kia, "The Detection of Duplicates in Document Image Databases," icdar, pp.314, Fourth International Conference Document Analysis and Recognition (ICDAR'97), 1997 Usage of this product signifies your acceptance of the Terms of Use. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||