Subscribe

Issue No.07 - July (2008 vol.20)

pp: 992-1006

ABSTRACT

The past decade has seen a wealth of research on time series representations. The vast majority of research has concentrated on representations that are calculated in batch mode and represent each value with approximately equal fidelity. However, the increasing deployment of mobile devices and real time sensors has brought home the need for representations that can be incrementally updated, and can approximate the data with fidelity proportional to its age. The latter property allows us to answer queries about the recent past with greater precision, since in many domains recent information is more useful than older information. We call such representations amnesic. While there has been previous work on amnesic representations, the class of amnesic functions possible was dictated by the representation itself. In this work, we introduce a novel representation of time series that can represent arbitrary, user-specified amnesic functions. We propose online algorithms for our representation, and discuss their properties. Finally, we perform an extensive empirical evaluation on 40 datasets, and show that our approach can efficiently maintain a high quality amnesic approximation.

INDEX TERMS

time series, amnesic approximation, streaming algorithm

CITATION

Michail Vlachos, Eamonn Keogh, Dimitrios Gunopulos, "Streaming Time Series Summarization Using User-Defined Amnesic Functions",

*IEEE Transactions on Knowledge & Data Engineering*, vol.20, no. 7, pp. 992-1006, July 2008, doi:10.1109/TKDE.2007.190737REFERENCES

- [1]
The UCR Time Series Data Mining Archive, Computer Science and Eng. Dept., Univ. of California, Riverside, http://www.cs.ucr.edu/~eamonnTSDMA/, 2002.- [2] H. André-Jönsson and D. Badal, “Using Signature Files for Querying Time-Series Data,”
Principles of Data Mining and Knowledge Discovery, pp. 211-220, June 1997.- [4] J. Basch, “Kinetic Data Structures,” PhD dissertation, Dept. Computer Science, Stanford Univ., 1999.
- [6] A. Bulut and A.K. Singh, “SWAT: Hierarchical Stream Summarization in Large Networks,”
Proc. 19th Int'l Conf. Data Eng. (ICDE '03), pp. 303-314, Mar. 2003.- [8] K. Chakrabarti, E.J. Keogh, S. Mehrotra, and M.J. Pazzani, “Locally Adaptive Dimensionality Reduction for Indexing Large Time Series Databases,”
ACM Trans. Database Systems, vol. 27, no. 2, pp. 188-228, 2002.- [10] Y. Chen, G. Dong, J. Han, B.W. Wah, and J. Wang, “Multi-Dimensional Regression Analysis of Time-Series Data Streams,”
Proc. 28th Int'l Conf. Very Large Data Bases (VLDB '02), pp. 323-334, Aug. 2002.- [11] E. Cohen and M. Strauss, “Maintaining Time-Decaying Stream Aggregates,”
Proc. 22nd ACM Symp. Principles of Database Systems (PODS '03), pp. 223-233, June 2003.- [14] X. Ge and P. Smyth, “Segmental Semi-Markov Models for Endpoint Detection in Plasma Etching,”
Proc. AEC/APC Symp., Sept. 2000.- [15] A.C. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. Strauss, “Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries,”
Proc. 27th Int'l Conf. Very Large Data Bases (VLDB '01), pp. 79-88, 2001.- [16] S. Guha and N. Koudas, “Approximating a Data Stream for Querying and Estimation: Algorithms and Performance Evaluation,”
Proc. 18th Int'l Conf. Data Eng. (ICDE '02), pp. 567-576, Mar. 2002.- [17] T. Hastie, R. Tibshirani, and J. Friedman,
The Elements of Statistical Learning. Springer, 2001.- [18] R. Hogg, A. Rankin, M. McHenry, D. Helmick, C. Bergh, S. Roumeliotis, and L. Matthies, “Sensors and Algorithms for Small Robot Leader/Follower Behavior,”
Proc. SPIE 15th AeroSense Symp., Apr. 2001.- [19] J. Hunter and N. McIntosh, “Knowledge-Based Event Detection in Complex Time Series Data,”
Artificial Intelligence in Medicine and Medical Decision Making, pp. 271-280, June 1999.- [21] E. Keogh, S. Lonardi, and W. Chiu, “Finding Surprising Patterns in a Time Series Database in Linear Time and Space,”
Proc. ACM SIGKDD '02, pp. 550-556, July 2002.- [22] E.J. Keogh and S. Kasetty, “On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration,”
Proc. ACM SIGKDD '02, pp. 102-111, July 2002.- [23] E.J. Keogh and M.J. Pazzani, “An Enhanced Representation of Time Series Which Allows Fast and Accurate Classification, Clustering and Relevance Feedback,”
Proc. ACM SIGKDD '98, pp. 239-243, Aug. 1998.- [26] J. Lin, E. Keogh, S. Lonardi, and B. Chiu, “A Symbolic Representation of Time Series, with Implications for Streaming Algorithms,”
Proc. Eighth ACM SIGMOD Workshop Research Issues on Data Mining and Knowledge Discovery (DMKD '03), June 2003.- [27] S. Park and W.W. Chu, “Discovering and Matching Elastic Rules from Sequence Databases,”
Fundamenta Informaticae, vol. 47, nos.1-2, pp. 75-90, 2001.- [28] I. Popivanov and R.J. Miller, “Similarity Search over Time Series Data Using Wavelets,”
Proc. Int'l Conf. Data Eng., pp. 802-813, Feb. 2002.- [30] D. Rafiei, “On Similarity-Based Queries for Time Series Data,”
Proc. 15th Int'l Conf. Data Eng. (ICDE '99), Mar. 1999.- [31] S. Salvador, P. Chan, and J. Brodie, “Learning States and Rules for Time Series Anomaly Detection,”
Proc. 17th Int'l FLAIRS Conf., pp.300-305, May 2004.- [32] S. Somayajulu, E. Reiter, and I. Davy, “SumTime-Mousam: Configurable Marine Weather Forecast Generator,”
Expert Update, vol. 6, no. 3, pp. 4-10, 2004.- [33] D. Steere, A. Baptista, D. McNamee, C. Pu, and J. Walpole, “Research Challenges in Environmental Observation and Forecasting Systems,”
Proc. ACM MobiCom '00, Aug. 2000.- [34] H.J.L.M. Vullings, M.H.G. Verhaegen, and H.B. Verbruggen, “ECG Segmentation Using Time-Warping,”
Proc. Second Int'l Symp. Intelligent Data Analysis (IDA '97), pp. 275-285, Aug. 1997.- [38] B. Yi and C. Faloutsos, “Fast Time Sequence Indexing for Arbitrary LP-Norms,”
Proc. 26th Int'l Conf. Very Large Data Bases (VLDB '00), pp. 385-394, Sept. 2000. |