This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Querying Time Series Data Based on Similarity
September/October 2000 (vol. 12 no. 5)
pp. 675-693

Abstract—We study similarity queries for time series data where similarity is defined, in a fairly general way, in terms of a distance function and a set of affine transformations on the Fourier series representation of a sequence. We identify a safe set of transformations supporting a wide variety of comparisons and show that this set is rich enough to formulate operations such as moving average and time scaling. We also show that queries expressed using safe transformations can efficiently be computed without prior knowledge of the transformations. We present a query processing algorithm that uses the underlying multidimensional index built over the data set to efficiently answer similarity queries. Our experiments show that the performance of this algorithm is competitive to that of processing ordinary (exact match) queries using the index, and much faster than sequential scanning. We propose a generalization of this algorithm for simultaneously handling multiple transformations at a time, and give experimental results on the performance of the generalized algorithm.

[1] R. Agrawal, C. Faloutsos, and A. Swami, “Efficient Similarity Search in Sequence Databases,” Proc. Fourth Int'l Conf. Foundations of Data Organization and Algorithms, pp. 69-84, Oct. 1993.
[2] R. Agrawal, K. Lin, H.S. Sawhney, and K. Shim, “Fast Similarity Search in the Presence of Noise, Scaling and Translation in Time-Series Databases,” Proc. Very Large Data Bases, pp. 490-501, Sept. 1995.
[3] R. Agrawal, G. Psaila, E.L. Wimmers, and M. Zait, “Querying Shapes of Histories,” Proc. Very Large Data Bases (VLDB) Conf., pp. 502-514, 1995.
[4] N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger, “The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles,” Proc. ACM SIGMOD Conf. Management of Data, 1990.
[5] K.K.W. Chu and M.H. Wong, “Fast Time Series Searching with Scaling and Shifting,” Proc. ACM Symp. Principles of Database Systems (PODS '99), pp. 237–248, 1999.
[6] R.D. Edwards and J. Magee, Technical Analysis of Stock Trends. Springfield, Mass., 1969.
[7] C. Faloutsos, H.V. Jagadish, A.O. Mendelzon, and T. Milo, “A Signature Technique for Similarity-Based Queries,” Proc. Compression and Complexity of Sequences (SEQUENCES '97), June 1997.
[8] C. Faloutsos, M. Ranganathan, and I. Manolopoulos, “Fast Subsequence Matching in Time Series Databases,” Proc. ACM SIGMOD, pp. 419-429, May 1994.
[9] D.Q. Goldin and P.C. Kanellakis, “On Similarity Queries for Time Series Data: Constraint Specification and Implementation,” Proc. Int'l Conf. Principles and Practice of Constraint Programming, pp. 137-153, 1995.
[10] S. Guha, R. Rastogi, and K. Shim, CURE: An Efficient Clustering Algorithm for Large Databases Proc. ACM SIGMOD, pp. 73-84, June 1998.
[11] A. Guttman, “R-Trees: A Dynamic Index Structure for Spatial Searching,” Proc. ACM SIGMOD Conf. Management of Data, 1984.
[12] H. Jagadish, A. Medelzon, and T. Milo, “Similarity-Based Queries,” Proc. ACM Principles of Database Systems (PODS), pp. 36-45, May 1995.
[13] D. Lomet and B. Salzberg, "The hB-Tree: A Multiattribute Indexing Method with Good Guaranteed Performance," ACM Trans. Database Systems. vol. 15, no. 4, pp. 625-658, Dec. 1990.
[14] C.-S. Li, P.S. Yu, and V. Castelli, “Hierarchyscan: A Hierarchical Similarity Search Algorithm for Databases of Long Sequences,” Proc. Int'l Conf. Data Eng., 1996.
[15] NRC-CNRC, Feature Selection Bibliography. http://ai.iit.nrc.ca/bibliographiesfeature-selection.html .
[16] J. Nievergelt, H. Hinterberger, and K.C. Sevcik, "The Grid File: An Adaptable, Symmetric Multikey File Structure," ACM Trans. Database Systems, vol. 9, no. 1, pp. 38-71, Mar. 1984.
[17] A.V. Oppenheim and R.W. Schafer, Discrete-Time Signal Processing.Englewood Cliffs, N.J.: Prentice Hall, 1989.
[18] D. Rafiei, “Fourier-Transform Based Techniques in Efficient Retrieval of Similar Time Sequences,” PhD thesis, Univ. of Toronto, 1998.
[19] N. Roussopoulos, S. Kelley, and F. Vincent, “Nearest Neighbor Queries,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 71-79, 1995.
[20] D. Rafiei and A. Mendelzon, “Similarity-Based Queries for Time Series Data,” Proc. ACM SIGMOD Conf. Management of Data, pp. 13-25, 1997.
[21] D. Rafiei and A. Mendelzon, “Efficient Retrieval of Similar Time Sequences Using DFT,” Proc. Fifth Int'l Conf. Foundations of Data Organizations and Algorithms (FODO '98), pp. 249–257, Nov. 1998.
[22] W.G. Roth, “MIMSY: A System for Analyzing Time Series Data in the Stock Market Domain,” master's thesis, Univ. of Wisconsin, Madison, 1993.
[23] R. Ramakrishnan,D. Srivastava,, and S. Sudarshan,“Coral_Control, relations and logic,” Proc. 18th Int’l Conf. Very Large Data Bases, pp. 547-559.,Vancouver, Can., Aug. 1992.
[24] D. Sankoff and J.B. Kruskal, Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. Addison-Wesley, 1983.
[25] K.C. Sevcik and N. Koudas, “Filter Trees for Managing Spatial Data Over a Range of Size Granularities,” Proc. 23rd Int'l Conf. Very Large Data Bases (VLDB '96), pp. 16–27, Sept. 1996.
[26] T. Seidl and H.-P. Kriegel, “Optimal Multi-Step k-Nearest Neighbor Search,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 154-165, 1998.
[27] P. Seshadri, M. Livny, and R. Ramakrishnan, “Sequence Query Processing,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 430-441, May 1994.
[28] H. Shatkay and S.B. Zdonik, “Approximate Queries and Representations for Large Data Sequence,” Proc. Int'l Conf. Data Eng. (ICDE), pp. 536-545, 1996.
[29] B.-K. Yi, H.V. Jagadish, and C. Faloutsos, “Efficient Retrieval of Similar Time Sequences under Time Warping,” Proc. Int'l Conf. Data Eng., 1998.

Index Terms:
Similarity queries, time series retrieval, indexing time series, Fourier transform.
Citation:
Davood Rafiei, Alberto O. Mendelzon, "Querying Time Series Data Based on Similarity," IEEE Transactions on Knowledge and Data Engineering, vol. 12, no. 5, pp. 675-693, Sept.-Oct. 2000, doi:10.1109/69.877502
Usage of this product signifies your acceptance of the Terms of Use.