This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Polishing Blemishes: Issues in Data Correction
March/April 2004 (vol. 19 no. 2)
pp. 34-39
Choh Man Teng, Institute for Human and Machine Cognition

Data quality is crucial to any data-analysis task, yet blemishes in data can arise from many sources. We thus must understand data imperfections and the effectiveness of various imperfection-handling techniques. The author compares three approaches: robust algorithms that tolerate some corruption; filtering, which eliminates the noisy instances from the input; and polishing, which corrects rather than removes noisy instances. The author argues that polishing has theoretical advantages over the first two approaches and can achieve better results. He also discusses how to evaluate and validate data-correction methods, identifying pitfalls in and suggestions for designing effective metrics for accurately reflecting the extent of correction.

Index Terms:
data cleaning, data correction, quality assessment
Citation:
Choh Man Teng, "Polishing Blemishes: Issues in Data Correction," IEEE Intelligent Systems, vol. 19, no. 2, pp. 34-39, March-April 2004, doi:10.1109/MIS.2004.1274909
Usage of this product signifies your acceptance of the Terms of Use.