2017 IEEE International Conference on Data Mining Workshops (ICDMW) (2017)
New Orleans, Louisiana, USA
Nov. 18, 2017 to Nov. 21, 2017
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDMW.2017.11
Clustering is an important branch in the field of data mining as well as statistical analysis and is widely used in exploratory analysis. Many algorithms exist for clustering in the Euclidean space. However, time series clustering introduces new problems, such as inadequate distance measure, inaccurate cluster center description, lack of efficient and accurate clustering techniques. When dealing with time series data, Dynamic Time Warping (DTW) is an accepted and effective distance measure. For cluster updates and representation, DTW Barycenter Averaging (DBA) algorithm being a global averaging method using DTW and has proven to be an effective averaging method for time series data. In this paper, we propose a Distance Density clustering method that is a medoid-based clustering with time series data density consideration which provides clustering results in a hierarchy fashion. First, we introduce two clustering initialization techniques, from the time series similarity matrix we use majority voting to determine either the nearest or the furthest time series as the initial clustering seed. By doing so, our clustering method is deterministic, and the clustering results can always be reproduced. In the Distance Density clustering algorithm, we use medoids because it is a more representative alternative to the statistical mean, especially with time series data where the mean value is often non-existent. The time series density is a virtual density based on time series similarity; this can find more natural splits in a dataset and also the number of clusters does not need to be determined a priori. Experiments using the Distance Density clustering technique on the UCR dataset demonstrates that clustering initialization is crucial in obtaining stable and better results than random initialization on average, and is also more accurate than traditional distance clustering.
data mining, mathematics computing, matrix algebra, pattern clustering, statistical analysis, time series
R. Ma and R. Angryk, "Distance and Density Clustering for Time Series Data," 2017 IEEE International Conference on Data Mining Workshops (ICDMW), New Orleans, Louisiana, USA, 2018, pp. 25-32.