2016 IEEE International Conference on Smart Cloud (SMARTCLOUD) (2016)
New York, New York, USA
Nov. 18, 2016 to Nov. 20, 2016
Distance-based and density-based clustering algorithmsare often used on large spatial and arbitrary shape ofdata sets. However, some well-known clustering algorithms havetroubles when distribution of objects in the dataset varies, andthis may lead to a bad clustering result. Such bad performancesare more dramatically significant on high-dimensional dataset. Recently, Rodriguez and Laio proposed an efficient clusteringalgorithm  based on two essential indicators: density anddistance, which are used to find the cluster centers and playan important role in the process of clustering. However, thisalgorithm does not work well on high dimensional data sets, sincethe threshold of cluster centers has been defined ambiguouslyand hence it has to be decided visually and manually. In thispaper, an alternative definition of the indicators is introducedand the threshold of cluster centers is automatically decided byusing an improved Canopy algorithm. With fixed centers (eachrepresents a cluster), each remaining data object is assigned toa cluster dependently in a single step. The performance of thealgorithm is analyzed on several benchmarks. The experimentalresults show that (1) the clustering performance on some highdimensional data sets, e.g., intrusion detection, is better, and (2)on low dimensional data sets, the performances are as good asthe traditional clustering algorithms.
Clustering algorithms, Algorithm design and analysis, Partitioning algorithms, Approximation algorithms, Measurement, Shape, Standards
R. Zhou et al., "A Distance and Density-Based Clustering Algorithm Using Automatic Peak Detection," 2016 IEEE International Conference on Smart Cloud (SMARTCLOUD), New York, New York, USA, 2016, pp. 176-183.