This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Skyline Index for Time Series Data
June 2004 (vol. 16 no. 6)
pp. 669-684

Abstract—We have developed a new indexing strategy that helps overcome the curse of dimensionality for time series data. Our proposed approach, called Skyline Index, adopts new Skyline Bounding Regions (SBR) to approximate and represent a group of time series data according to their collective shape. Skyline bounding regions allow us to define a distance function that tightly lower bounds the distance between a query and a group of time series data. In an extensive performance study, we investigate the impact of different distance functions by various dimensionality reduction and indexing techniques on the performance of similarity search, including index pages accessed, data objects fetched, and overall query processing time. In addition, we show that, for k{\hbox{-}}{\rm{nearest}} neighbor queries, the proposed Skyline index approach can be coupled with the state of the art dimensionality reduction techniques such as Adaptive Piecewise Constant Approximation (APCA) and improve its performance by up to a factor of 3.

[1] R. Agrawal, C. Faloutsos, and A. Swami, Efficient Similarity Search in Sequence Databases Proc. Int'l Conf. Foundations of Data Organizations and Algorithms, pp. 69-84, Oct. 1993.
[2] R. Agrawal, K.-I. Lin, H.S. Sawhney, and K. Shim, Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases Proc. 21st Very Large Databases (VLDB) Conf., pp. 490-501, Sept. 1995.
[3] N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger, The$R^*$-Tree: An Efficient and Robust Access Method for Points and Rectangles Proc. 1990 ACM-SIGMOD Conf., pp. 322-331, May 1990.
[4] D. Berndt and J. Clifford, Using Dynamic Time Warping to Find Patterns in Time Series Proc. Knowledge Discovery in Databases Workshop, pp. 359-370, July 1994.
[5] K. Chakrabarti and S. Mehrotra, The Hybrid Tree: An Index Structure for High-Dimensional Feature Spaces Proc. Int'l Conf. Data Eng., pp. 440-447, 1999.
[6] K.P. Chan and A. Fu, “Efficient Time Series Matching by Wavelets,” Proc. Int'l Conf. Data Eng., 1999.
[7] P. Ciaccia, M. Patella, and P. Zezula, M-Tree: An Efficient Access Method for Similarity Search in Metric Spaces Proc. 23rd Very Large Databases (VLDB) Conf., pp. 426-435, Aug. 1997.
[8] C. Faloutsos, Searching Multimedia Databases Content. Boston: Kluwer Academic, 1996.
[9] C. Faloutsos, H.V. Jagadish, A.O. Mendelzon, and T. Milo, “A Signature Technique for Similarity-Based Queries,” Proc. Compression and Complexity of Sequences (SEQUENCES '97), June 1997.
[10] C. Faloutsos, A.M. Ranganathan, and Y. Manolopoulos, Fast Subsequence Matching in Time-Series Databases Proc. 1994 ACM-SIGMOD Conf., pp. 419-429, May 1994.
[11] A. Guttman, R-Trees: A Dynamic Index Structure for Spatial Searching Proc. 1984 ACM-SIGMOD Conf., pp. 47-57, June 1984.
[12] J.M. Hellerstein, J.F. Naughton, and A. Pfeffer, Generalized Search Trees for Database Systems Proc. 21st Very Large Databases (VLDB) Conf., pp. 562-573, Sept. 1995.
[13] S. Hettich and S.D. Bay, The UCI KDD Archive, http:/kdd.ics.uci.edu, 2002.
[14] K.J. Jacob and D. Shasha, FinTime A Financial Time Series Benchmark, http://cs.nyu.edu/cs/faculty/shashafintime.html , Mar. 2000.
[15] T. Kahveci and A. Singh, Variable Length Queries for Time Series Data Proc. Int'l Conf. Data Eng., 2001.
[16] E. Keogh, Exact Indexing of Dynamic Time Warping Proc. 28th Very Large Databases (VLDB) Conf., pp. 406-417, Aug. 2002.
[17] E. Keogh, K. Chakrabarti, S. Mehrotra, and M. Pazzani, Locally Adaptive Dimensionality Reduction for Indexing Large Time Series Databases Proc. 2001 ACM-SIGMOD Conf., pp. 151-162, May 2001.
[18] E. Keogh, K. Chakrabarti, M. Pazzani, and S. Mehrotra, Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases Knowledge and Information Systems, vol. 3, no. 3, pp. 263-286, 2000.
[19] E. Keogh and S. Kasetty, On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration Proc. Eighth ACM-SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 102-111, July 2002.
[20] E. Keogh and M. Pazzani, Scaling up Dynamic Time Warping for Datamining Applications Proc. Sixth ACM-SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 285-289, Aug. 2000.
[21] E. Keogh and P. Smyth, A Probabilistic Approach to Fast Pattern Matching in Time Series Databases Proc. Third ACM-SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 24-30, Aug. 1997.
[22] S.-W. Kim, S. Park, and W.W. Chu, An Index-Based Approach for Similarity Search Supporting Time Warping in Large Sequence Databases Proc. 17th Int'l Conf. Data Eng., pp. 607-614, Apr. 2001.
[23] F. Korn, H.V. Jagadish, and C. Faloutsos, Efficiently Supporting Ad Hoc Queries in Large Datasets of Time Sequences Proc. 1997 ACM-SIGMOD Conf., pp. 289-300, May 1997.
[24] G.B. Moody, MIT-BIH Database Distribution, http://ecg.mit.eduindex.html, 1999.
[25] Y.-S. Moon, K.-Y. Whang, and W.-S. Han, GeneralMatch: A Subsequence Matching Method in Time-Series Databases Based on Generalized Windows Proc. 2002 ACM-SIGMOD Conf., pp. 382-393, June 2002.
[26] Y.-S. Moon, K.-Y. Whang, and W.-K. Loh, Duality-Based Subsequence Matching in Time-Series Databases Proc. 17th Int'l Conf. Data Eng., pp. 263-272, Apr. 2001.
[27] S. Park, W.W. Chu, J. Yoon, and C. Hsu, Efficient Searches for Similar Subsequences of Different Lengths in Sequence Databases Proc. Int'l Conf. Data Eng., Feb. 2000.
[28] C.-S. Perng, H. Wang, S.R. Zhang, and D.S. Parker, Landmarks: A New Model for Similarity-Based Pattern Querying in Time Series Databases Proc. Int'l Conf. Data Eng., 2000.
[29] I. Popivanov and R.J. Miller, Similarity Search over Time-Series Data Using Wavelets Proc. Int'l Conf. Data Eng., pp. 212-221, 2002.
[30] D. Rafiei and A. Mendelzon, Similarity-Based Queries for Time Series Data Proc. 1997 ACM-SIGMOD Conf., pp. 13-25, May 1997.
[31] D. Rafiei and A. Mendelzon, Efficient Retrieval of Similar Time Sequences Using DFT Proc. Int'l Conf. Foundations of Data Organizations and Algorithms, Nov. 1998.
[32] T. Seidl and H.-P. Kriegel, Optimal MultiStep$k$-Nearest Neighbor Search, Proc. 1998 ACM-SIGMOD Conf., pp. 154-165, May 1998.
[33] H. Shatkay and S.B. Zdonik, “Approximate Queries and Representations for Large Data Sequence,” Proc. Int'l Conf. Data Eng. (ICDE), pp. 536-545, 1996.
[34] Z.R. Struzik and A.P.J.M. Siebes, The Haar Wavelet Transform in the Time Series Similarity Paradigm Proc. Principles of Data Mining and Knowledge Discovery, Third European Conf., pp. 12-22, Sept. 1999.
[35] Y.-L. Wu, D. Agrawal, and A. El Abbadi, A Comparison of DFT and DWT Based Similarity Search in Time-Series Databases Proc. Ninth ACM-CIKM Int'l Conf. Information and Knowledge Management, pp. 488-495, Nov. 2000.
[36] B.-K. Yi and C. Faloutsos, Fast Time Sequence Indexing for Arbitrary$L_p$Norms Proc. 26th Very Large Databases (VLDB) Conf., pp. 385-394, Sept. 2000.
[37] B.-K. Yi, H.V. Jagadish, and C. Faloutsos, “Efficient Retrieval of Similar Time Sequences under Time Warping,” Proc. Int'l Conf. Data Eng., 1998.

Index Terms:
Data approximation, dimensionality reduction, similarity search, skyline bounding region, skyline index, time series data.
Citation:
Quanzhong Li, In?s Fernando Vega L?pez, Bongki Moon, "Skyline Index for Time Series Data," IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 6, pp. 669-684, June 2004, doi:10.1109/TKDE.2004.14
Usage of this product signifies your acceptance of the Terms of Use.