From the December 2013 issue
A Blocking Framework for Entity Resolution in Highly Heterogeneous Information Spaces
By George Papadakis, Ekaterini Ioannou, Themis Palpanas, Claudia Niederée, and Wolfgang Nejdl
In the context of entity resolution (ER) in highly heterogeneous, noisy, user-generated entity collections, practically all block building methods employ redundancy to achieve high effectiveness. This practice, however, results in a high number of pairwise comparisons, with a negative impact on efficiency. Existing block processing strategies aim at discarding unnecessary comparisons at no cost in effectiveness. In this paper, we systemize blocking methods for clean-clean ER (an inherently quadratic task) over highly heterogeneous information spaces (HHIS) through a novel framework that consists of two orthogonal layers: the effectiveness layer encompasses methods for building overlapping blocks with small likelihood of missed matches; the efficiency layer comprises a rich variety of techniques that significantly restrict the required number of pairwise comparisons, having a controllable impact on the number of detected duplicates. We map to our framework all relevant existing methods for creating and processing blocks in the context of HHIS, and additionally propose two novel techniques: attribute clustering blocking and comparison scheduling. We evaluate the performance of each layer and method on two large-scale, real-world data sets and validate the excellent balance between efficiency and effectiveness that they achieve.
Editorials and Announcements
- Get Your Journals as eBooks for Free
- TKDE celebrates its 25th Anniversary. Editor-in-Chief Jian Pei says, "We are celebrating the 25th Anniversary of TKDE. Since its first issue in March 1989, TKDE has published 2,981 articles, and another 220 articles in the early access portal. With 898 submissions and 79 accepted articles in 2012, TKDE is now the premier journal in the broad and general fields of data management, data mining, and knowledge engineering. We thank all the authors, reviewers, and readers for their continuing support to TKDE. As always, we are eager to hear your ideas and suggestions, and will do our best to meet your expectations. With all your passions, contributions, and supports, TKDE is embracing the new era of big data and big data analytics. Happy birthday to TKDE!"
- eBooks of issues of TKDE can now be downloaded from the Computer Society Digital Library
- Editorial (August 2013)
- New EIC Editorial (Feb 2013)
- Outgoing EIC Editorial (Feb 2013)
- State of the Journal (Feb 2012)
- EIC Editorial (January 2011)
- Special Section on the 27th International Conference on Data Engineering (ICDE 2011)(Oct 2012)
- Special Section on Keyword Search on Structured Data (Dec 2011)
- Cloud Data Management (Sept 2011)
- Special Section on the 26th International Conference on Data Engineering (Aug 2011)
Access recently published TKDE articles
Subscribe to the RSS feed of latest TKDE content added to the digital library.
Sign up for the Transactions Connection newsletter.
IEEE Transactions on Knowledge and Data Engineering (TKDE) is an archival journal published monthly designed to inform researchers, developers, managers, strategic planners, users, and others interested in state-of-the-art and state-of-the-practice activities in the knowledge and data engineering area.
Read the full scope of TKDE