loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Sixth IEEE International Conference on Data Mining (ICDM'06)
High Quality, Efficient Hierarchical Document Clustering Using Closed Interesting Itemsets
Hong Kong
December 18-December 22
ISBN: 0-7695-2701-9
Hassan H. Malik, Columbia University
John R. Kender, Columbia University
High dimensionality remains a significant challenge for document clustering. Recent approaches used frequent itemsets and closed frequent itemsets to reduce dimensionality, and to improve the efficiency of hierarchical document clustering. In this paper, we introduce the notion of "closed interesting" itemsets (i.e. closed itemsets with high interestingness). We provide heuristics such as "super item" to efficiently mine these itemsets and show that they provide significant dimensionality reduction over closed frequent itemsets.

Using "closed interesting" itemsets, we propose a new, sub-linearly scalable, hierarchical document clustering method that outperforms state of the art agglomerative, partitioning and frequent-itemset based methods both in terms of clustering quality and runtime performance, without requiring dataset specific parameter tuning. We evaluate twenty interestingness measures and show that when used to generate "closed interesting" itemsets, and to select parent nodes, Mutual Information, Added Value, Yule?s Q and Chi- Square offer best clustering performance.

Citation:
Hassan H. Malik, John R. Kender, "High Quality, Efficient Hierarchical Document Clustering Using Closed Interesting Itemsets," icdm, pp.991-996, Sixth IEEE International Conference on Data Mining (ICDM'06), 2006
Usage of this product signifies your acceptance of the Terms of Use.