This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Decision Trees for Mining Data Streams Based on the Gaussian Approximation
Jan. 2014 (vol. 26 no. 1)
pp. 108-119
Leszek Rutkowski, Czestochowa University of Technology, Czestochowa
Maciej Jaworski, Czestochowa University of Technology, Czestochowa
Lena Pietruczuk, Czestochowa University of Technology, Czestochowa
Piotr Duda, Czestochowa University of Technology, Czestochowa
Since the Hoeffding tree algorithm was proposed in the literature, decision trees became one of the most popular tools for mining data streams. The key point of constructing the decision tree is to determine the best attribute to split the considered node. Several methods to solve this problem were presented so far. However, they are either wrongly mathematically justified (e.g., in the Hoeffding tree algorithm) or time-consuming (e.g., in the McDiarmid tree algorithm). In this paper, we propose a new method which significantly outperforms the McDiarmid tree algorithm and has a solid mathematical basis. Our method ensures, with a high probability set by the user, that the best attribute chosen in the considered node using a finite data sample is the same as it would be in the case of the whole data stream.
Index Terms:
Decision trees,Entropy,Training,Data mining,Impurities,Indexes,Random variables,Gaussian approximation,Data steam,decision trees,information gain
Citation:
Leszek Rutkowski, Maciej Jaworski, Lena Pietruczuk, Piotr Duda, "Decision Trees for Mining Data Streams Based on the Gaussian Approximation," IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 1, pp. 108-119, Jan. 2014, doi:10.1109/TKDE.2013.34
Usage of this product signifies your acceptance of the Terms of Use.