loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Sixth IEEE International Conference on Data Mining (ICDM'06)
Discovering Unrevealed Properties of Probability Estimation Trees: On Algorithm Selection and Performance Explanation
Hong Kong
December 18-December 22
ISBN: 0-7695-2701-9
Kun Zhang, Tulane University, USA
Wei Fan, IBM T.J. Watson Research, USA
Bill Buckles, Tulane University, USA
Xiaojing Yuan, Huston University
Zujia Xu, Dillard University
There has been increasing interest to design better probability estimation trees, or PETs, for ranking and probability estimation. Capable of generating class membership probabilities, PETs have been shown to be highly accurate and flexible for many difficult problems, such as cost-sensitive learning and matching skewed distributions. There are a large number of PET algorithms available, and about ten of them are well-known. This large number provides an advantage, but it also creates confusion in practice. One would ask "given a new dataset, which algorithm to choose and what performance to expect and not to expect? What are the reasons to explain either good or bad performance under different situations?" In this paper, we systematically, for the first time, answer these important questions by conducting a large-scale empirical comparison of five popular PETs by examining their AUC, MSE and error rate "learning curves" (instead of training-test split based cross-validation). Using the maximum AUC achieved by any of the evaluated probability estimation tree algorithms, we demonstrate that the preference of a probability estimation tree on different evaluation metrics can be accurately characterized by the "signal-noise separability" of the dataset, as well as some other observable statistics of the dataset explained further in the paper. Moreover, in order to understand their relative performance, many important and previously unrevealed properties of each PET?s mechanism and heuristics are analyzed and evaluated. Importantly, a practical guide for choosing the most appropriate PET algorithm given a new data mining problem is provided.
Citation:
Kun Zhang, Wei Fan, Bill Buckles, Xiaojing Yuan, Zujia Xu, "Discovering Unrevealed Properties of Probability Estimation Trees: On Algorithm Selection and Performance Explanation," icdm, pp.741-752, Sixth IEEE International Conference on Data Mining (ICDM'06), 2006
Usage of this product signifies your acceptance of the Terms of Use.