Proceedings 2001 IEEE International Conference on Data Mining (2001)
San Jose, California
Nov. 29, 2001 to Dec. 2, 2001
In this paper we show that most hierarchical agglomerative clustering (HAC)algorithms follow a 90-10 rule where roughly 90%iterations from the beginning merge cluster pairs with dissimilarity less than 10%of the maximum dissimilarity. We propose two algorithms - 2-phase and nested - based on partially overlapping partitioning (POP).To handle high-dimensional data efficiently, we propose a tree structure particularly suitable for POP. Extensive experiments show that the proposed algorithms reduce the time and memory requirement of existing HAC algorithms significantly without compromising in accuracy.
M. Dash, H. Liu and K. L. Tan, "Efficient Yet Accurate Clustering," Proceedings 2001 IEEE International Conference on Data Mining(ICDM), San Jose, California, 2001, pp. 99.