loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Third IEEE International Conference on Data Mining (ICDM'03)
Clustering of Time Series Subsequences is Meaningless: Implications for Previous and Future Research
Melbourne, Florida
November 19-November 22
ISBN: 0-7695-1978-4
Eamonn Keogh, University of California - Riverside
Jessica Lin, University of California - Riverside
Wagner Truppel, University of California - Riverside
Time series data is perhaps the most frequently encountered type of data examined by the data mining community. Clustering is perhaps the most frequently used data mining algorithm, being useful in it's own right as an exploratory technique, and also as a subroutine in more complex data mining algorithms such as rule discovery, indexing, summarization, anomaly detection, and classification. Given these two facts, it is hardly surprising that time series clustering has attracted much attention. The data to be clustered can be in one of two formats: many individual time series, or a single time series, from which individual time series are extracted with a sliding window. Given the recent explosion of interest in streaming data and online algorithms, the latter case has received much attention.
In this work we make an amazing claim. Clustering of streaming time series is completely meaningless. More concretely, clusters extracted from streaming time series are forced to obey a certain constraint that is pathologically unlikely to be satisfied by any dataset, and because of this, the clusters extracted by any clustering algorithm are essentially random. While this constraint can be intuitively demonstrated with a simple illustration and is simple to prove, it has never appeared in the literature.
We can justify calling our claim surprising, since it invalidates the contribution of dozens of previously published papers. We will justify our claim with a theorem, illustrative examples, and a comprehensive set of experiments on reimplementations of previous work.
Index Terms:
Time Series, Data Mining, Clustering, Rule Discovery
Citation:
Eamonn Keogh, Jessica Lin, Wagner Truppel, "Clustering of Time Series Subsequences is Meaningless: Implications for Previous and Future Research," icdm, pp.115, Third IEEE International Conference on Data Mining (ICDM'03), 2003
Usage of this product signifies your acceptance of the Terms of Use.