loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
10th International Database Engineering and Applications Symposium (IDEAS'06)
Effective Incremental Clustering for Duplicate Detection in Large Databases
Delhi, India
December 11-December 14
ISBN: 0-7695-2577-6
Francesco Folino, ICAR-CNR, Via Bucci 41c, Italy
Giuseppe Manco, ICAR-CNR, Via Bucci 41c, Italy
Luigi Pontieri, ICAR-CNR Via Bucci 41c, Italy
We propose an incremental algorithm for discovering clusters of duplicate tuples in large databases. The core of the approach is the usage of an indexing technique which, for any newly arrived tuple ?, allows to efficiently retrieve a set of tuples in the database which are mostly similar to ?, and which are likely to refer to the same real-world entity which is associated with ?. The proposed index is based on a hashing approach which tends to assign similar objects to the same buckets. Empirical and analytical evaluation demonstrates that the proposed approach achieves satisfactory efficiency results, at the cost of low accuracy loss.
Citation:
Francesco Folino, Giuseppe Manco, Luigi Pontieri, "Effective Incremental Clustering for Duplicate Detection in Large Databases," ideas, pp.45-52, 10th International Database Engineering and Applications Symposium (IDEAS'06), 2006
Usage of this product signifies your acceptance of the Terms of Use.