Issue No. 02 - March/April (2004 vol. 19)
Choh Man Teng , Institute for Human and Machine Cognition
<p>Data quality is crucial to any data-analysis task, yet blemishes in data can arise from many sources. We thus must understand data imperfections and the effectiveness of various imperfection-handling techniques. The author compares three approaches: robust algorithms that tolerate some corruption; filtering, which eliminates the noisy instances from the input; and polishing, which corrects rather than removes noisy instances. The author argues that polishing has theoretical advantages over the first two approaches and can achieve better results. He also discusses how to evaluate and validate data-correction methods, identifying pitfalls in and suggestions for designing effective metrics for accurately reflecting the extent of correction.</p>
data cleaning, data correction, quality assessment
C. M. Teng, "Polishing Blemishes: Issues in Data Correction," in IEEE Intelligent Systems, vol. 19, no. , pp. 34-39, 2004.