loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
16th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'04)
Noise Identification with the k-Means Algorithm
Boca Raton, Florida
November 15-November 17
ISBN: 0-7695-2236-X
Wei Tang, Florida Atlantic University
Taghi M. Khoshgoftaar, Florida Atlantic University
The presence of noise in a measurement dataset can have a negative effect on the classification model built. More specifically, the noisy instances in the dataset can adversely affect the learnt hypothesis. Removal of noisy instances will improve the learnt hypothesis; thus, improving the classification accuracy of the model. A clustering-based noise detection approach using the k-means algorithm is presented. We present a new metric for measuring the potentiality (noise factor) of an instance being noisy. Based on the computed noise factor values of the instances, the clustering-based algorithm is then used to identify and eliminate p% of the instances in the dataset. These p% of instances are considered the most likely to be noisy among the instances in the dataset — the p% value is varied from 1% to 40%. THe noise detection approach is investigated with respect to two case studies of software measurement data obtained from NASA software projects. The two datasets are characterized by the same thirteen software metrics and a class labled that classifies the program modules as a fault-prone and not fault-prone. It is shown that as more noisy instances are removed, classification accuracy of the C4.5 learner improves. This indicates that the removed instances are most likely noisy instances that attributed to poor classification accuracy.
Index Terms:
data noise, outliers, software quality estimation, software metrics, k-means, clustering
Citation:
Wei Tang, Taghi M. Khoshgoftaar, "Noise Identification with the k-Means Algorithm," ictai, pp.373-378, 16th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'04), 2004
Usage of this product signifies your acceptance of the Terms of Use.