loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Sixth IEEE International Conference on Data Mining (ICDM'06)
Improving Grouped-Entity Resolution Using Quasi-Cliques
Hong Kong
December 18-December 22
ISBN: 0-7695-2701-9
Byung-Won On, The Pennsylvania State University, USA
Ergin Elmacioglu, The Pennsylvania State University, USA
Dongwon Lee, The Pennsylvania State University, USA
Jaewoo Kang, NCSU & Korea Univ., Korea
Jian Pei, Simon Fraser Univ., Canada
The entity resolution (ER) problem, which identifies duplicate entities that refer to the same real world entity, is essential in many applications. In this paper, in particular, we focus on resolving entities that contain a group of related elements in them (e.g., an author entity with a list of citations, a singer entity with song list, or an intermediate result by GROUP BY SQL query). Such entities, named as grouped-entities, frequently occur in many applications. The previous approaches toward grouped-entity resolution often rely on textual similarity, and produce a large number of false positives. As a complementing technique, in this paper, we present our experience of applying a recently proposed graph mining technique, Quasi-Clique, atop conventional ER solutions. Our approach exploits contextual information mined from the group of elements per entity in addition to syntactic similarity. Extensive experiments verify that our proposal improves precision and recall up to 83% when used together with a variety of existing ER solutions, but never worsens them.
Citation:
Byung-Won On, Ergin Elmacioglu, Dongwon Lee, Jaewoo Kang, Jian Pei, "Improving Grouped-Entity Resolution Using Quasi-Cliques," icdm, pp.1008-1015, Sixth IEEE International Conference on Data Mining (ICDM'06), 2006
Usage of this product signifies your acceptance of the Terms of Use.