loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Decision Trees for Uncertain Data
PrePrint
ISSN: 1041-4347
Smith Tsang, The University of Hong Kong, Hong Kong
Ben Kao, The University of Hong Kong, Hong Kong
Kevin Y. Yip, Yale University, New Haven
Wai-Shing Ho, The University of Hong Kong, Hong Kong
Sau Dan Lee, The University of Hong Kong, Hong Kong
Traditional decision tree classifiers work with data whose values are known and precise. We extend such classifiers to handle data with uncertain information. Value uncertainty arises in many applications during the data collection process. Example sources of uncertainty include measurement/quantisation errors, data staleness, and multiple repeated measurements. With uncertainty, the value of a data item is often represented not by one single value, but by multiple values forming a probability distribution. Rather than abstracting uncertain data by statistical derivatives (such as mean and median), we discover that the accuracy of a decision tree classifier can be much improved if the "complete information" of a data item that takes into account the probability density function (pdf) of that item's value is utilised. We extend classical decision tree building algorithms to handle data tuples with uncertain values. Extensive experiments have been conducted that show that the resulting classifiers are more accurate than those using value averages. Since processing pdf's is computationally more costly than processing single values (e.g., averages), decision tree construction on uncertain data is more CPU demanding than that for certain data. To tackle this problem, we propose a series of pruning techniques that can greatly improve construction efficiency.
Index Terms:
Deduction and Theorem Proving and Knowledge Processing, Clustering, classification, and association rules, Data mining, Decision support, Knowledge and data engineering tools and techniques
Citation:
Smith Tsang, Ben Kao, Kevin Y. Yip, Wai-Shing Ho, Sau Dan Lee, "Decision Trees for Uncertain Data," IEEE Transactions on Knowledge and Data Engineering, 11 Aug. 2009. IEEE computer Society Digital Library. IEEE Computer Society, <http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.175>
Usage of this product signifies your acceptance of the Terms of Use.