The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.07 - July (2008 vol.20)
pp: 992-1006
ABSTRACT
The past decade has seen a wealth of research on time series representations. The vast majority of research has concentrated on representations that are calculated in batch mode and represent each value with approximately equal fidelity. However, the increasing deployment of mobile devices and real time sensors has brought home the need for representations that can be incrementally updated, and can approximate the data with fidelity proportional to its age. The latter property allows us to answer queries about the recent past with greater precision, since in many domains recent information is more useful than older information. We call such representations amnesic. While there has been previous work on amnesic representations, the class of amnesic functions possible was dictated by the representation itself. In this work, we introduce a novel representation of time series that can represent arbitrary, user-specified amnesic functions. We propose online algorithms for our representation, and discuss their properties. Finally, we perform an extensive empirical evaluation on 40 datasets, and show that our approach can efficiently maintain a high quality amnesic approximation.
INDEX TERMS
time series, amnesic approximation, streaming algorithm
CITATION
Themis Palpanas, Michail Vlachos, Eamonn Keogh, Dimitrios Gunopulos, "Streaming Time Series Summarization Using User-Defined Amnesic Functions", IEEE Transactions on Knowledge & Data Engineering, vol.20, no. 7, pp. 992-1006, July 2008, doi:10.1109/TKDE.2007.190737
REFERENCES
[1] The UCR Time Series Data Mining Archive, Computer Science and Eng. Dept., Univ. of California, Riverside, http://www.cs.ucr.edu/~eamonnTSDMA/, 2002.
[2] H. André-Jönsson and D. Badal, “Using Signature Files for Querying Time-Series Data,” Principles of Data Mining and Knowledge Discovery, pp. 211-220, June 1997.
[3] A. Barreto, A. Araujo, and S. Kremer, “A Taxonomy for Spatiotemporal Connectionist Networks Revisited: The Unsupervised Case,” Neural Computation, vol. 15, pp. 1255-1320, 2003.
[4] J. Basch, “Kinetic Data Structures,” PhD dissertation, Dept. Computer Science, Stanford Univ., 1999.
[5] R. Bellman, “On the Approximation of Curves by Line Segments Using Dynamic Programming,” Comm. ACM, vol. 4, no. 6, p. 284, 1961.
[6] A. Bulut and A.K. Singh, “SWAT: Hierarchical Stream Summarization in Large Networks,” Proc. 19th Int'l Conf. Data Eng. (ICDE '03), pp. 303-314, Mar. 2003.
[7] Y. Cai and R.T. Ng, “Indexing Spatio-Temporal Trajectories with Chebyshev Polynomials,” Proc. ACM SIGMOD '04, pp. 599-610, June 2004.
[8] K. Chakrabarti, E.J. Keogh, S. Mehrotra, and M.J. Pazzani, “Locally Adaptive Dimensionality Reduction for Indexing Large Time Series Databases,” ACM Trans. Database Systems, vol. 27, no. 2, pp. 188-228, 2002.
[9] K. Chan and W. Fu, “Efficient Time Series Matching by Wavelets,” Proc. 15th Int'l Conf. Data Eng. (ICDE '99), pp. 126-133, Mar. 1999.
[10] Y. Chen, G. Dong, J. Han, B.W. Wah, and J. Wang, “Multi-Dimensional Regression Analysis of Time-Series Data Streams,” Proc. 28th Int'l Conf. Very Large Data Bases (VLDB '02), pp. 323-334, Aug. 2002.
[11] E. Cohen and M. Strauss, “Maintaining Time-Decaying Stream Aggregates,” Proc. 22nd ACM Symp. Principles of Database Systems (PODS '03), pp. 223-233, June 2003.
[12] B. de Vries and J.C. Principe, “The Gamma Model—A New Neural Model for Temporal Processing,” Neural Networks, vol. 5, pp. 565-576, 1992.
[13] C. Faloutsos, M. Ranganathan, and Y. Manolopoulos, “Fast Subsequence Matching in Time-Series Databases,” Proc. ACM SIGMOD '94, pp. 419-429, May 1994.
[14] X. Ge and P. Smyth, “Segmental Semi-Markov Models for Endpoint Detection in Plasma Etching,” Proc. AEC/APC Symp., Sept. 2000.
[15] A.C. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. Strauss, “Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries,” Proc. 27th Int'l Conf. Very Large Data Bases (VLDB '01), pp. 79-88, 2001.
[16] S. Guha and N. Koudas, “Approximating a Data Stream for Querying and Estimation: Algorithms and Performance Evaluation,” Proc. 18th Int'l Conf. Data Eng. (ICDE '02), pp. 567-576, Mar. 2002.
[17] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning. Springer, 2001.
[18] R. Hogg, A. Rankin, M. McHenry, D. Helmick, C. Bergh, S. Roumeliotis, and L. Matthies, “Sensors and Algorithms for Small Robot Leader/Follower Behavior,” Proc. SPIE 15th AeroSense Symp., Apr. 2001.
[19] J. Hunter and N. McIntosh, “Knowledge-Based Event Detection in Complex Time Series Data,” Artificial Intelligence in Medicine and Medical Decision Making, pp. 271-280, June 1999.
[20] E. Keogh, S. Chu, D. Hart, and M. Pazzani, “An Online Algorithm for Segmenting Time Series,” Proc. 17th Int'l Conf. Data Mining (ICDE '01), pp. 289-296, Nov. 2001.
[21] E. Keogh, S. Lonardi, and W. Chiu, “Finding Surprising Patterns in a Time Series Database in Linear Time and Space,” Proc. ACM SIGKDD '02, pp. 550-556, July 2002.
[22] E.J. Keogh and S. Kasetty, “On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration,” Proc. ACM SIGKDD '02, pp. 102-111, July 2002.
[23] E.J. Keogh and M.J. Pazzani, “An Enhanced Representation of Time Series Which Allows Fast and Accurate Classification, Clustering and Relevance Feedback,” Proc. ACM SIGKDD '98, pp. 239-243, Aug. 1998.
[24] A. Koski, M. Juhola, and M. Meriste, “Syntactic Recognition of ECG Signals by Attributed Finite Automata,” Pattern Recognition, vol. 28, no. 12, pp. 1927-1940, 1995.
[25] I. Lazaridis and S. Mehrotra, “Capturing Sensor-Generated Time Series with Quality Guarantees,” Proc. 19th Int'l Conf. Data Eng. (ICDE '03), pp. 429-440, Mar. 2003.
[26] J. Lin, E. Keogh, S. Lonardi, and B. Chiu, “A Symbolic Representation of Time Series, with Implications for Streaming Algorithms,” Proc. Eighth ACM SIGMOD Workshop Research Issues on Data Mining and Knowledge Discovery (DMKD '03), June 2003.
[27] S. Park and W.W. Chu, “Discovering and Matching Elastic Rules from Sequence Databases,” Fundamenta Informaticae, vol. 47, nos.1-2, pp. 75-90, 2001.
[28] I. Popivanov and R.J. Miller, “Similarity Search over Time Series Data Using Wavelets,” Proc. Int'l Conf. Data Eng., pp. 802-813, Feb. 2002.
[29] W. Pugh, “Skiplists: A Probabilistic Alternative to Balanced Trees,” Comm. ACM, vol. 33, no. 6, pp. 668-676, 1990.
[30] D. Rafiei, “On Similarity-Based Queries for Time Series Data,” Proc. 15th Int'l Conf. Data Eng. (ICDE '99), Mar. 1999.
[31] S. Salvador, P. Chan, and J. Brodie, “Learning States and Rules for Time Series Anomaly Detection,” Proc. 17th Int'l FLAIRS Conf., pp.300-305, May 2004.
[32] S. Somayajulu, E. Reiter, and I. Davy, “SumTime-Mousam: Configurable Marine Weather Forecast Generator,” Expert Update, vol. 6, no. 3, pp. 4-10, 2004.
[33] D. Steere, A. Baptista, D. McNamee, C. Pu, and J. Walpole, “Research Challenges in Environmental Observation and Forecasting Systems,” Proc. ACM MobiCom '00, Aug. 2000.
[34] H.J.L.M. Vullings, M.H.G. Verhaegen, and H.B. Verbruggen, “ECG Segmentation Using Time-Warping,” Proc. Second Int'l Symp. Intelligent Data Analysis (IDA '97), pp. 275-285, Aug. 1997.
[35] H. Wu, B. Salzberg, and D. Zhang, “Online Event-Driven Subsequence Matching over Financial Data Streams,” Proc. ACM SIGMOD '04, pp. 23-34, June 2004.
[36] H. Wu, G.C. Sharp, B. Salzberg, D. Kaeli, H. Shirato, and S.B. Jiang, “A Finite State Model for Respiratory Motion Analysis in Image Guided Radiation Therapy,” Physics in Medicine and Biology, vol. 49, no. 23, pp. 5357-5372, 2004.
[37] Y.-L. Wu, D. Agrawal, and A. El Abbadi, “A Comparison of DFT and DWT Based Similarity Search in Time-Series Databases,” Proc. Ninth ACM Int'l Conf. Information and Knowledge Management (CIKM '00), pp. 488-495, Nov. 2000.
[38] B. Yi and C. Faloutsos, “Fast Time Sequence Indexing for Arbitrary LP-Norms,” Proc. 26th Int'l Conf. Very Large Data Bases (VLDB '00), pp. 385-394, Sept. 2000.
[39] Y. Zhao and S. Zhang, “Generalized Dimension-Reduction Framework for Recent-Biased Time Series Analysis,” IEEE Trans. Knowledge and Data Eng., vol. 18, no. 2, pp. 231-244, Feb. 2006.
[40] Y. Zhu and D. Shasha, “StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time,” Proc. 28th Int'l Conf. Very Large Data Bases (VLDB '02), pp. 358-369, Aug. 2002.
21 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool