The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.02 - March/April (2004 vol.19)
pp: 28-33
Mong Li Lee , National University of Singapore
Wynne Hsu , National University of Singapore
Vijay Kothari , National University of Singapore
ABSTRACT
<p>Data quality problems can arise from abbreviations, data entry mistakes, duplicate records, missing fields, and many other sources. Data-cleaning research has focused on duplicate elimination or the merge/purge problem. Another problem is erroneous data called spurious links, where a real-world entity has multiple record links that might not be properly associated with it. One approach to this problem is to use context information to clean up the spurious links. This approach identifies and retrieves the data containing potential spurious links, then performs a context similarity comparison to determine records with high overlaps. The degree of overlapping context indicates the likelihood of spurious links. Experiments on three real-world data sets demonstrate that this approach can correctly identify spurious links and thus assist data cleaning.</p>
INDEX TERMS
data cleaning, data quality problems, context information
CITATION
Mong Li Lee, Wynne Hsu, Vijay Kothari, "Cleaning the Spurious Links in Data", IEEE Intelligent Systems, vol.19, no. 2, pp. 28-33, March/April 2004, doi:10.1109/MIS.2004.1274908
22 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool