This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Optimizing Similarity Search for Arbitrary Length Time Series Queries
April 2004 (vol. 16 no. 4)
pp. 418-433

Abstract—We consider the problem of finding similar patterns in a time sequence. Typical applications of this problem involve large databases consisting of long time sequences of different lengths. Current time sequence search techniques work well for queries of a prespecified length, but not for arbitrary length queries. We propose a novel indexing technique that works well for arbitrary length queries. The proposed technique stores index structures at different resolutions for a given data set. We prove that this index structure is superior to existing index structures that use a single resolution. We propose a range query and nearest neighbor query technique on this index structure and prove the optimality of our index structure for these search techniques. The experimental results show that our method is 4 to 20 times faster than the current techniques, including Sequential Scan, for range queries and 3 times faster than Sequential Scan and other techniques for nearest neighbor queries. Because of the need to store information at multiple resolution levels, the storage requirement of our method could potentially be large. In the second part of the paper, we show how the index information can be compressed with minimal information loss. According to our experimental results, even after compressing the size of the index to one fifth, the total cost of our method is 3 to 15 times less than the current techniques.

[1] R. Agrawal, C. Faloutsos, and A. Swami, Efficient Similarity Search in Sequence Databases Proc. Int'l Conf. Foundations of Data Organization and Algorithms, Oct. 1993.
[2] R. Agrawal, K. Lin, H.S. Sawhney, and K. Shim, Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases Proc. Conf. Very Large Databases, Sept. 1995.
[3] N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger, The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles Proc. ACM SIGMOD Conf., pp. 322-331, 1990.
[4] K.P. Chan and A. Fu, “Efficient Time Series Matching by Wavelets,” Proc. Int'l Conf. Data Eng., 1999.
[5] K.K.W. Chu and M.H. Wong, Fast Time-Series Searching with Scaling and Shifting Proc. Symp. Principles of Database Systems, 1999.
[6] C. Faloutsos, M. Ranganathan, and Y. Manolopoulos, Fast Subsequence Matching in Time-Series Databases Proc. ACM SIGMOD Conf., pp. 419-429, May 1994.
[7] A. Guttman, R-Trees: A Dynamic Index Structure for Spatial Searching Proc. ACM SIGMOD Conf., pp. 47-57, 1984.
[8] T. Kahveci and A. Singh, An Efficient Index Structure for String Databases Proc. Conf. Very Large Databases, pp. 351-360, Sept. 2001.
[9] T. Kahveci and A. Singh, Variable Length Queries for Time Series Data Proc. Int'l Conf. Data Eng., 2001.
[10] T. Kahveci, A.K. Singh, and A. Gürel, Similarity Searching for Multi-Attribute Sequences Proc. 14th Int'l Conf. Scientific and Statistical Database Management, 2002.
[11] K.V.R. Kanth, D. Agrawal, and A. Singh, Dimensionality-Reduction for Similarity Searching in Dynamic Databases Proc. ACM SIGMOD Conf., June 1998.
[12] E. Keogh, K. Chakrabarti, S. Mehrotra, and M. Pazzani, Locally Adaptive Dimensionality Reduction for Indexing Large Time Series Databases Proc. ACM SIGMOD Conf., 2001.
[13] E. Keogh, K. Chakrabarti, M. Pazzani, and S. Mehrotra, Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases Knowledge and Information Systems J., 2000.
[14] F. Korn, N. Sidiropoulos, C. Faloutsos, E. Siegel, and Z. Protopapas, Fast Nearest Neighbor Search in Medical Databases Proc. Conf. Very Large Databases, pp. 215-226, 1996.
[15] S.-L. Lee, S.-J. Chun, D.-H. Kim, J.-H. Lee, and C.-W. Chung, Similarity Search for Multidimensional Data Sequences Proc. Int'l Conf. Data Eng., 2000.
[16] S. Park, W.W. Chu, J. Yoon, and C. Hsu, Efficient Searches for Similar Subsequences of Different Lengths in Sequence Databases Proc. Int'l Conf. Data Eng., Feb. 2000.
[17] C.-S. Perng, H. Wang, S.R. Zhang, and D.S. Parker, Landmarks: A New Model for Similarity-Based Pattern Querying in Time Series Databases Proc. Int'l Conf. Data Eng., 2000.
[18] I. Popivanov and R.J. Miller, Similarity Search over Time-Series Data Using Wavelets Proc. Int'l Conf. Data Eng., pp. 212-221, 2002.
[19] D. Rafiei and A. O. Mendelzon, Similarity-Based Queries for Time Series Data Proc. ACM SIGMOD Conf., pp. 13-25, 1997.
[20] D. Rafiei and A.O. Mendelzon, Efficient Retrieval of Similar Time Sequences Using DFT Proc. Int'l Conf. Foundations of Data Organization and Algorithms, 1998.
[21] R.M. Rao and A.S. Bopardikar, Wavelet Transforms Introduction to Theory and Applications. Addison Wesley, 1998.
[22] B. Seeger, An Analysis of Schedules for Performing Multi-Page Requests Information Systems, vol. 21, no. 5, pp. 387-407, 1996.
[23] B. Seeger, P.-A. Larson, and R. McFayden, Reading a Set of Disk Pages Proc. Conf. Very Large Databases, pp. 592-603, Aug. 1993.
[24] T. Seidl and H.P. Kriegel, Optimal Multi-Step$k{\hbox{-}}\rm Nearest$Neighbor Search Proc. ACM SIGMOD Conf., 1998.
[25] C. Shahabi, X. Tian, and W. Zhao, TSA-Tree: A Wavelet-Based Approach to Improve the Efficiency of Multi-Level Surprise and Trend Queries Proc. 12th Int'l Conf. Scientific and Statistical Database Management, 2000.
[26] M. Vetterli and J. Kovacevic, Wavelets and Subband Coding. Prentice Hall, 1995.
[27] M. Vlachos, G. Kollios, and D. Gunopulos, Discovering Similar Multidimensional Trajectories Proc. Int'l Conf. Data Eng., pp. 673-684, 2002.
[28] C. Wang and X.S. Wang, Supporting Content-Based Searches on Time Series via Approximation Proc. 12th Int'l Conf. Scientific and Statistical Database Management, 2000.
[29] Y.-L. Wu, D. Agrawal, and A. El-Abbadi, A Comparison of DFT and DWT Based Similarity Search in Time-Series Databases Proc. Conf. Information and Knowledge Management, pp. 414-421, 2000.

Index Terms:
Time series, subsequence search, range query, nearest neighbor query, multiple resolutions.
Citation:
Tamer Kahveci, Ambuj K. Singh, "Optimizing Similarity Search for Arbitrary Length Time Series Queries," IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 4, pp. 418-433, April 2004, doi:10.1109/TKDE.2004.1269667
Usage of this product signifies your acceptance of the Terms of Use.