The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.01 - January (2008 vol.20)
pp: 40-54
ABSTRACT
With the advance of hardware and communication technologies, stream time series is gaining ever-increasing attention due to its importance in many applications, such as financial data processing, network monitoring, web click-stream analysis, sensor data mining and anomaly detection. For all these applications, an efficient and effective similarity search over stream data is essential. Even though many approaches have been proposed for searching through archived data, because of the unique characteristics of the stream, for example, data are frequently updated and real-time response is required, traditional methods may not work in these stream scenarios. Especially, for the cases where the arrival of data is often delayed for various reasons, for example, the communication congestion or batch processing and so on, queries on such incomplete time series or even future time series may result in inaccuracy using the traditional approaches. Therefore, in this paper we propose three approaches, polynomial, DFT and probabilistic, to predict the unknown values that have not arrived at the system and answer the queries based on the predicated data. We also present efficient indexes, that is, a multidimensional hash index and B+-tree, to facilitate the prediction and similarity search on future time series, respectively. Extensive experiments demonstrate the efficiency and effectiveness of our methods in terms of I/O, prediction and query accuracy
INDEX TERMS
Information Search and Retrieval, Search process, Multimedia databases, Query processing
CITATION
Xiang Lian, Lei Chen, "Efficient Similarity Search over Future Stream Time Series", IEEE Transactions on Knowledge & Data Engineering, vol.20, no. 1, pp. 40-54, January 2008, doi:10.1109/TKDE.2007.190666
REFERENCES
[1] R. Agrawal, C. Faloutsos, and A. Swami, “Efficient Similarity Search in Sequence Databases,” Proc. Fourth Int'l Conf. Foundations of Data Organization and Algorithms (FODO '93), 1993.
[2] B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom, “Models and Issues in Data Stream Systems,” Proc. 21st ACM SIGACT-SIGMOD-SIGART Symp. Principles of Database Systems (PODS '02), 2002.
[3] D.J. Berndt and J. Clifford, “Finding Patterns in Time Series: A Dynamic Programming Approach,” Advances in Knowledge Discovery and Data Mining, 1996.
[4] N. Beckmann, H. Kriegel, R. Schneider, and B. Seeger, “The ${\rm R}^{\ast}\hbox{-}{\rm Tree}$ : An Efficient and Robust Access Method for Points and Rectangles,” Proc. ACM SIGMOD, 1990.
[5] J.S. Boreczky and L.A. Rowe, “Comparison of Video Shot Boundary Detection Techniques,” Proc. Int'l Symp. Storage and Retrieval for Image and Video Databases, 1996.
[6] A. Bulut and A.K. Singh, “A Unified Framework for Monitoring Data Streams in Real Time,” Proc. 21st Int'l Conf. Data Eng. (ICDE '05), 2005.
[7] D.M. Chickering, “The WinMine Toolkit,” Technical Report MSR-TR-2002-103, Microsoft, 2003.
[8] C. Cranor, T. Johnson, and O. Spatscheck, “Gigascope: A Stream Database for Network Applications,” Proc. ACM SIGMOD, 2003.
[9] L. Chen and R. Ng, “On the Marriage of Lp-Norms and Edit Distance,” Proc. 30th Int'l Conf. Very Large Data Bases (VLDB '04), 2004.
[10] Y. Cai and R. Ng, “Indexing Spatio-Temporal Trajectories with Chebyshev Polynomials,” Proc. ACM SIGMOD, 2004.
[11] L. Chen, M.T. Ozsu, and V. Oria, “Robust and Fast Similarity Search for Moving Object Trajectories,” Proc. ACM SIGMOD, 2005.
[12] G. Das, D. Gunopulos, and H. Mannila, “Finding Similar Time Series,” Proc. First European Symp. Principles of Data Mining and Knowledge Discovery (PKDD '97), 1997.
[13] C. Faloutsos, M. Ranganathan, and Y. Manolopoulos, “Fast Subsequence Matching in Time-Series Databases,” Proc. ACM SIGMOD, 1994.
[14] L. Gao, Z. Yao, and X. Wang, “Evaluating Continuous Nearest Neighbor Queries for Streaming Time Series via Pre-Fetching,” Proc. 11th Int'l Conf. Information and Knowledge Management (CIKM '02), 2002.
[15] L. Gao and X. Wang, “Continually Evaluating Similarity-Based Pattern Queries on a Streaming Time Series,” Proc. ACM SIGMOD, 2002.
[16] I. Gath and A.B. Geva, “Unsupervised Optimal Fuzzy Clustering,” IEEE Trans. Pattern Analysis and Machine Intelligence, 1989.
[17] L. Gyorfi, G. Lugosi, and G. Morvai, “A Simple Randomized Algorithm for Sequential Prediction of Ergodic Time Series,” IEEE Trans. Information Theory, 1999.
[18] T.V. Gestel, J. Suykens, D.E. Baestaens, A. Lambrechts, G. Lanckriet, B. Vandaele, D.B. Moor, and J. Vandewalle, “Financial Time Series Prediction Using Least Squares Support Vector Machines within the Evidence Framework,” IEEE Trans. Neural Networks, 2001.
[19] Y.W. Huang and P.S. Yu, “Adaptive Query Processing for Time-Series Data,” Proc. Fifth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '99), 1999.
[20] K. Kalpakis, D. Gada, and V. Puttagunta, “Distance Measures for Effective Clustering of ARIMA Time-Series,” Proc. Int'l Conf. Data Mining (ICDM '01), 2001.
[21] K. Kanth, D. Agrawal, and A. Singh, “Dimensionality Reduction for Similarity Searching in Dynamic Databases,” Proc. ACM SIGMOD, 1998.
[22] E. Keogh, K. Chakrabarti, S. Mehrotra, and M. Pazzani, “Locally Adaptive Dimensionality Reduction for Indexing Large Time Series Databases,” Proc. ACM SIGMOD, 2001.
[23] E. Keogh, “Exact Indexing of Dynamic Time Warping,” Proc. 28th Int'l Conf. Very Large Data Bases (VLDB '02), 2002.
[24] E. Keogh, S. Lonardi, and W. Chiu, “Finding Surprising Patterns in a Time Series Database in Linear Time and Space,” Proc. Eighth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '02), 2002.
[25] I. Kim and S.R. Lee, “A Fuzzy Time Series Prediction Method Based on Consecutive Values,” Proc. IEEE Int'l Fuzzy Systems Conf., 1999.
[26] S. Kim, S. Park, and W. Chu, “An Indexed-Based Approach for Similarity Search Supporting Time Warping in Large Sequence Databases,” Proc. 17th Int'l Conf. Data Eng. (ICDE '01), 2001.
[27] M. Kontaki and A.N. Papadopoulos, “Efficient Similarity Search in Streaming Time Series,” Proc. 16th Int'l Conf. Scientific and Statistical Database Management (SSDBM '04), 2004.
[28] J. Lin, E. Keogh, S. Lonardi, and B. Chiu, “A Symbolic Representation of Time Series, with Implications for Streaming Algorithms,” Proc. Eighth ACM SIGMOD Workshop Research Issues in Data Mining and Knowledge Discovery (DMKD '03), 2003.
[29] S. Lin, M.T. Ozsu, V. Oria, and R. Ng, “An Extendible Hash for Multi-Precision Similarity Querying of Image Databases,” Proc. 27th Int'l Conf. Very Large Data Bases (VLDB '01), 2001.
[30] D.J.C. MacKay, “Bayesian Interpolation,” Neural Computation, 1992.
[31] C. Meek, D.M. Chickering, and D. Heckerman, “Autoregressive Tree Models for Time-Series Analysis,” Proc. Second SIAM Int'l Conf. Data Mining (SDM '02), 2002.
[32] S. Policker and A. Geva, “A New Algorithm for Time Series Prediction by Temporal Fuzzy Clustering,” Proc. 15th Int'l Conf. Pattern Recognition (ICPR '00), 2000.
[33] I. Popivanov and R. Miller, “Similarity Search over Time Series Data Using Wavelets,” Proc. 18th Int'l Conf. Data Eng. (ICDE '02), 2002.
[34] Y.T. Qian, S. Jia, and W. Si, “Markov Model Based Time Series Similarity Measuring,” Proc. Int'l Conf. Machine Learning and Cybernetics, 2003.
[35] Y. Qu, C. Wang, L. Gao, and X.S. Wang, “Supporting Movement Pattern Queries in User-Specified Scales,” IEEE Trans. Knowledge and Data Eng., 2003.
[36] K. Rose, E. Gurewitz, and G. Fox, “A Deterministic Annealing Approach to Clustering,” IEEE Pattern Recognition Letters, 1990.
[37] T. Seidl and H. Kriegel, “Optimal Multi-Step k-Nearest Neighbor Search,” Proc. ACM SIGMOD, 1998.
[38] Y. Tao, D. Papadias, and X. Lian, “Reverse kNN Search in Arbitrary Dimensionality,” Proc. 30th Int'l Conf. Very Large Data Bases (VLDB '04), 2004.
[39] Y. Tao, D. Papadias, X. Lian, and X. Xiao, “Multidimensional Reverse kNN Search,” VLDB J., 2005.
[40] M. Vlachos, G. Kollios, and D. Gunopulos, “Discovering Similar Multidimensional Trajectories,” Proc. 18th Int'l Conf. Data Eng. (ICDE '02), 2002.
[41] R. Vilalta and S. Ma, “Predicting Rare Events in Temporal Domains,” Proc. Int'l Conf. Data Mining (ICDM '02), 2002.
[42] C.Z. Wang and X. Wang, “Supporting Content-Based Searches on Time Series via Approximation,” Proc. 12th Int'l Conf. Scientific and Statistical Database Management (SSDBM '00), 2000.
[43] L. Wang, K.K. Teo, and Z. Lin, “Predicting Time Series with Wavelet Packet Neural Networks,” Proc. Int'l Joint Conf. Neural Network (IJCNN '01), 2001.
[44] H. Wu, B. Salzberg, and D. Zhang, “Online Event-Driven Subsequence Matching over Financial Data Streams,” Proc. ACM SIGMOD, 2004.
[45] W. Xue, Q. Luo, L. Chen, and Y. Liu, “Contour Map Matching for Event Detection in Sensor Networks,” Proc. ACM SIGMOD, 2006.
[46] B. Yi and C. Faloutsos, “Fast Time Sequence Indexing for Arbitrary Lp Norms,” Proc. 26th Int'l Conf. Very Large Data Bases (VLDB '00), 2000.
[47] B-K. Yi, H. Jagadish, and C. Faloutsos, “Efficient Retrieval of Similar Time Sequences under Time Warping,” Proc. 14th Int'l Conf. Data Eng. (ICDE '98), 1998.
[48] Q. Zhang and A. Benveniste, “Wavelet Networks,” IEEE Trans. Neural Networks, 1992.
[49] Y. Zhu and D. Shasha, “StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time,” Proc. 28th Int'l Conf. Very Large Data Bases (VLDB '02), 2002.
[50] Y. Zhu and D. Shasha, “Warping Indexes with Envelope Transforms for Query by Humming,” Proc. ACM SIGMOD, 2003.
[51] Y. Zhu and D. Shasha, “Efficient Elastic Burst Detection in Data Streams,” Proc. Ninth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '03), 2003.
32 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool