9th International Database Engineering & Application Symposium (IDEAS'05) An Incremental Clustering Scheme for Duplicate Detection in Large Databases Montreal, Canada July 25-July 27 ISBN: 0-7695-2404-4
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/IDEAS.2005.10
We propose an incremental algorithm for clustering duplicate tuples in large databases, which allows to assign any new tuple t to the cluster containing the database tuples which are most similar to t (and hence are likely to refer to the same real-world entity t is associated with). The core of the approach is a hash-based indexing technique that tends to assign highly similar objects to the same buckets. Empirical evaluation proves that the proposed method allows to gain considerable efficiency improvement over a state-of-art index structure for proximity searches in metric spaces.
Citation:
Eugenio Cesario, Francesco Folino, Giuseppe Manco, Luigi Pontieri, "An Incremental Clustering Scheme for Duplicate Detection in Large Databases," ideas, pp.89-95, 9th International Database Engineering & Application Symposium (IDEAS'05), 2005 Usage of this product signifies your acceptance of the Terms of Use. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||