loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
2009 Ninth IEEE International Conference on Data Mining
A Tree-Based Framework for Difference Summarization
Miami, Florida
December 06-December 09
ISBN: 978-0-7695-3895-2
Understanding the differences between two datasets is a fundamental data mining question and is also ubiquitously important across many real world scientific applications. In this paper, we propose a tree-based framework to provide a parsimonious explanation of the difference between two distributions based on rigorous two-sample statistical test. We develop two efficient approaches. The first one is a dynamic programming approach that finds a minimal number of data subsets that describe the difference between two data sets. The second one is a greedy approach that approximates the dynamic programming approach. We employ the well-known Friedman's MST (minimal spanning tree) statistics for two-sample statistical tests in our summarization tree construction, and develop novel techniques to speedup its computational procedure. We performed a detailed experimental evaluation on both real and synthetic datasets and demonstrated the effectiveness of our tree-summarization approach.
Index Terms:
difference summarization, minimal spanning tree, two-sample test, Friedman-Rafsky test, Chi-square test, Kolmogorov-Smirnov test
Citation:
Ruoming Jin, Yuri Breitbart, Rong Li, "A Tree-Based Framework for Difference Summarization," icdm, pp.209-218, 2009 Ninth IEEE International Conference on Data Mining, 2009
Usage of this product signifies your acceptance of the Terms of Use.