This Article 
 Bibliographic References 
 Add to: 
Novel Online Methods for Time Series Segmentation
December 2008 (vol. 20 no. 12)
pp. 1616-1626
To efficiently and effectively mine massive amounts of data in the time series, approximate representation of the data is one of the most commonly used strategies. Piecewise Linear Approximation is such an approach, which represents a time series by dividing it into segments and approximating each segment with a straight line. In this paper, we first propose a new segmentation criterion that improves computing efficiency. Based on this criterion, two novel online piecewise linear segmentation methods are developed, the feasible space window method and the stepwise feasible space window method. The former usually produces much fewer segments and is faster and more reliable in the running time than other methods. The latter can reduce the representation error with fewer segments. It achieves the best overall performance on the segmentation results compared with other methods. Extensive experiments on a variety of real-world time series have been conducted to demonstrate the advantages of our methods.

[1] R. Agrawal, C. Faloutsos, and A. Swami, “Efficient Similarity Search in Sequence Databases,” Proc. Fourth Conf. Foundations of Data Organization and Algorithms (FODO '93), pp. 69-84, 1993.
[2] U. Appel and A.V. Brandt, “Adaptive Sequential Segmentation of Piecewise Stationary Time Series,” Information Science, vol. 29, no. 1, pp. 27-56, 1983.
[3] G.F. Bryant and S.R. Duncan, “A Solution to the Segmentation Problem Based on Dynamic Programming,” Proc. Third IEEE Conf. Control Applications (CCA '94), pp. 1391-1396, 1994.
[4] F.K.-P. Chan, A.W.-C. Fu, and C. Yu, “Haar Wavelets for Efficient Similarity Search of Time-Series: With and without Time Warping,” IEEE Trans. Knowledge and Data Eng., vol. 15, no. 3, pp. 686-705, May/June 2003.
[5] S.R. Duncan and G.F. Bryant, “A New Algorithm for Segmenting Data from Time Series,” Proc. 35th IEEE Conf. Decision and Control (CDC '96), pp. 3123-3128, 1996.
[6] T.C. Fu, F.L. Chung, V. Ng, and R. Luk, “Evolutionary Segmentation of Financial Time Series into Subsequences,” Proc. Congress on Evolutionary Computation (CEC '01), pp. 426-430, 2001.
[7] X. Ge and P. Smyth, “Segmental Semi-Markov Models for Endpoint Detection in Plasma Etching,” IEEE Trans. Semiconductor Eng., 2001.
[8] A. Gionis and H. Mannila, “Segmentation Algorithms for Time Series and Sequence Data,” Tutorial in SIAM Int'l Conf. Data Mining, 2005.
[9] S. Guha, N. Koudas, and K. Shim, “Data-Streams and Histograms,” Proc. 33rd Ann. ACM Symp. Theory of Computing (STOC '01), pp. 471-475, 2001.
[10] http://www.nyse.comtaq/, 2006.
[11] E. Keogh and T. Folias, The UCR Time Series Data Mining Archive, Computer Science and Eng. Dept., Univ. of California, , 2002.
[12] K.V.R. Kanth, D. Agrawal, and A.K. Singh, “Dimensionality Reduction for Similarity Searching in Dynamic Databases,” Proc. ACM SIGMOD '98, pp. 166-176, 1998.
[13] E. Keogh and P. Smyth, “A Probabilistic Approach to Fast Pattern Matching in Time Series Databases,” Proc. ACM SIGKDD '97, pp. 20-24, 1997.
[14] E. Keogh et al., “Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases,” J. Knowledge and Information Systems, vol. 3, no. 2, pp. 263-286, 2001.
[15] E. Keogh et al., “Segmenting Time Series: A Survey and Novel Approach,” Data Mining in Time Series Databases, second ed. World Scientific, 2003.
[16] E. Keogh and M. Pazzani, “An Enhanced Representation of Time Series Which Allows Fast and Accurate Classification, Clustering and Relevance Feedback,” Proc. ACM SIGKDD '98, pp. 239-241, 1998.
[17] E. Keogh and M. Pazzani, “Scaling Up Dynamic Time Warping to Massive Dataset,” Proc. Third European Conf. Principles of Data Mining and Knowledge Discovery (PKDD '99), pp. 1-11, 1999.
[18] A. Koski, M. Juhola, and M. Meriste, “Syntactic Recognition of ECG Signals by Attributed Finite Automata,” Pattern Recognition, vol. 28, no. 12, pp. 1927-1940, 1995.
[19] L.C.-H. Lee, A. Liu, and W.-S. Chen, “Pattern Discovery of Fuzzy Time Series for Financial Prediction,” IEEE Trans. Knowledge and Data Eng., vol. 18, no. 5, pp. 613-625, May 2006.
[20] X.Y. Liu and H.Q. Wang, “A Discretization Algorithm Based on a Heterogeneity Criterion,” IEEE Trans. Knowledge and Data Eng., vol. 17, no. 9, pp. 1166-1173, Sept. 2005.
[21] T. Palpanas et al., “Online Amnesic Approximation of Streaming Time Series,” Proc. 20th Int'l Conf. Data Eng. (ICDE '04), pp.338-349, 2004.
[22] S. Park, S.W. Kim, and W.W. Chu, “Segment-Based Approach for Subsequence Searches in Sequence Databases,” Proc. 16th ACM Symp. Applied Computing (SAC '01), pp. 248-252, 2001.
[23] D. Rafiei and A.O. Mendelzon, “Efficient Retrieval of Similar Time Sequences Using DFT,” Proc. Fifth Int'l Conf. Foundations of Data Organization (FODO '98), pp. 249-257, 1998.
[24] H. Shatkay and S.B. Zdonik, “Approximate Queries and Representations for Large Data Sequences,” Technical Report CS-95-03, Dept. of Computer Science, Brown Univ., 1995.
[25] H.J.L.M. Vullings, M.H.G. Verhaegen, and H.B. Verbruggen, “ECG Segmentation Using Time-Warping,” Advances in Intelligent Data Analysis, pp. 275-285, 1997.
[26] B.K. Yi, H.V. Jagadish, and C. Faloutsos, “Efficient Retrieval of Similar Time Sequences under Time Warping,” Proc. 14th Int'l Conf. Data Eng. (ICDE '98), pp. 201-208, 1998.
[27] Y. Zhao and S. Zhang, “Generalized Dimension-Reduction Framework for Recent-Biased Time Series Analysis,” IEEE Trans. Knowledge and Data Eng., vol. 18, no. 2, pp. 231-244, Feb. 2006.

Index Terms:
Temporal databases, Data mining, Mining methods and algorithms, 0Information Storage
Xiaoyan Liu, Zhenjiang Lin, Huaiqing Wang, "Novel Online Methods for Time Series Segmentation," IEEE Transactions on Knowledge and Data Engineering, vol. 20, no. 12, pp. 1616-1626, Dec. 2008, doi:10.1109/TKDE.2008.29
Usage of this product signifies your acceptance of the Terms of Use.