This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Efficient Periodicity Mining in Time Series Databases Using Suffix Trees
January 2011 (vol. 23 no. 1)
pp. 79-94
Faraz Rasheed, University of Calgary, Calgary
Mohammed Alshalalfa, University of Calgary, Calgary
Reda Alhajj, University of Calgary, Calgary
Periodic pattern mining or periodicity detection has a number of applications, such as prediction, forecasting, detection of unusual activities, etc. The problem is not trivial because the data to be analyzed are mostly noisy and different periodicity types (namely symbol, sequence, and segment) are to be investigated. Accordingly, we argue that there is a need for a comprehensive approach capable of analyzing the whole time series or in a subsection of it to effectively handle different types of noise (to a certain degree) and at the same time is able to detect different types of periodic patterns; combining these under one umbrella is by itself a challenge. In this paper, we present an algorithm which can detect symbol, sequence (partial), and segment (full cycle) periodicity in time series. The algorithm uses suffix tree as the underlying data structure; this allows us to design the algorithm such that its worst-case complexity is O(k . n^2), where k is the maximum length of periodic pattern and n is the length of the analyzed portion (whole or subsection) of the time series. The algorithm is noise resilient; it has been successfully demonstrated to work with replacement, insertion, deletion, or a mixture of these types of noise. We have tested the proposed algorithm on both synthetic and real data from different domains, including protein sequences. The conducted comparative study demonstrate the applicability and effectiveness of the proposed algorithm; it is generally more time-efficient and noise-resilient than existing algorithms.

[1] M. Ahdesmäki, H. Lähdesmäki, R. Pearson, H. Huttunen, and O. Yli-Harja, "Robust Detection of Periodic Time Series Measured from Biological Systems," BMC Bioinformatics, vol. 6, no. 117, 2005.
[2] C. Berberidis, W. Aref, M. Atallah, I. Vlahavas, and A. Elmagarmid, "Multiple and Partial Periodicity Mining in Time Series Databases," Proc. European Conf. Artificial Intelligence, July 2002.
[3] H. Brown et al., "Sequence Variation in S-Antigen Genes of Plasmodium falciparum," Molecular Biology and Medicine, vol. 4, no. 6, pp. 365-376, Dec. 1987.
[4] C.-F. Cheung, J.X. Yu, and H. Lu, "Constructing Suffix Tree for Gigabyte Sequences with Megabyte Memory," IEEE Trans. Knowledge and Data Eng., vol. 17, no. 1, pp. 90-105, Jan. 2005.
[5] M. Dubiner et al., "Faster Tree Pattern Matching," J. ACM, vol. 14, pp. 205-213, 1994.
[6] M.G. Elfeky, W.G. Aref, and A.K. Elmagarmid, "Periodicity Detection in Time Series Databases," IEEE Trans. Knowledge and Data Eng., vol. 17, no. 7, pp. 875-887, July 2005.
[7] M.G. Elfeky, W.G. Aref, and A.K. Elmagarmid, "WARP: Time Warping for Periodicity Detection," Proc. Fifth IEEE Int'l Conf. Data Mining, Nov. 2005.
[8] J. Fayolle and M.D. Ward, "Analysis of the Average Depth in a Suffix Tree under a Markov Model," Proc. Int'l Conf. Analysis of Algorithms, pp. 95-104, 2005.
[9] D. Gusfield, Algorithms on Strings, Trees, and Sequences. Cambridge Univ. Press, 1997.
[10] E.F. Glynn, J. Chen, and A.R. Mushegian, "Detecting Periodic Patterns in Unevenly Spaced Gene Expression Time Series Using Lomb-Scargle Periodograms," Bioinformatics, vol. 22, no. 3 pp. 310-316, Feb. 2006.
[11] R. Grossi and G.F. Italiano, "Suffix Trees and Their Applications in String Algorithms," Proc. South Am. Workshop String Processing, pp. 57-76, Sept. 1993.
[12] J. Han, Y. Yin, and G. Dong, "Efficient Mining of Partial Periodic Patterns in Time Series Database," Proc. 15th IEEE Int'l Conf. Data Eng., p. 106, 1999.
[13] K.-Y. Huang and C.-H. Chang, "SMCA: A General Model for Mining Asynchronous Periodic Patterns in Temporal Databases," IEEE Trans. Knowledge and Data Eng., vol. 17, no. 6, pp. 774-785, June 2005.
[14] J. Han, W. Gong, and Y. Yin, "Mining Segment-Wise Periodic Patterns in Time Related Databases," Proc. ACM Int'l Conf. Knowledge Discovery and Data Mining, pp. 214-218, 1998.
[15] E. Hunt, R.W. Irving, and M.P. Atkinson, "Persistent Suffix Trees and Suffix Binary Search Trees as DNA Sequence Indexes," Technical Report TR2000-63, Univ. of Glasgow, Dept. of Computing Science, 2000.
[16] P. Indyk, N. Koudas, and S. Muthukrishnan, "Identifying Representative Trends in Massive Time Series Data Sets Using Sketches," Proc. Int'l Conf. Very Large Data Bases, Sept. 2000.
[17] M.V. Katti, R. Sami-Subbu, P.K. Rajekar, and V.S. Gupta, "Amino Acid Repeat Patterns in Protein Sequences: Their Diversity and Structural-Function Implications," Protein Science, vol. 9, no. 6, pp. 1203-1209, 2000.
[18] E. Keogh, J. Lin, and A. Fu, "HOT SAX: Efficiently Finding the Most Unusual Time Series Subsequence," Proc. Fifth IEEE Int'l Conf. Data Mining, pp. 226-233, 2005.
[19] R. Kolpakov and G. Kucherov, "Finding Maximal Repetitions in a Word in Linear Time," Proc. Ann. Symp. Foundations of Computer Science, pp. 596-604, 1999.
[20] N. Kumar, N. Lolla, E. Keogh, S. Lonardi, C.A. Ratanamahatana, and L. Wei, "Time-Series Bitmaps: A Practical Visualization Tool for Working with Large Time Series Databases," Proc. SIAM Int'l Conf. Data Mining, pp. 531-535, 2005.
[21] S. Ma and J. Hellerstein, "Mining Partially Periodic Event Patterns with Unknown Periods," Proc. 17th IEEE Int'l Conf. Data Eng., Apr. 2001.
[22] S. Papadimitriou, A. Brockwell, and C. Faloutsos, "Adaptive, Hands Off-Stream Mining," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 560-571, 2003.
[23] F. Rasheed, M. Alshalalfa, and R. Alhajj, "Adapting Machine Learning Technique for Periodicity Detection in Nucleosomal Locations in Sequences," Proc. Eighth Int'l Conf. Intelligent Data Eng. and automated Learning (IDEAL), pp. 870-879, Dec. 2007.
[24] F. Rasheed and R. Alhajj, "STNR: A Suffix Tree Based Noise Resilient Algorithm for Periodicity Detection in Time Series Databases," Applied Intelligence, vol. 32, no. 3, pp. 267-278, 2010.
[25] F. Rasheed and R. Alhajj, "Using Suffix Trees for Periodicity Detection in Time Series Databases," Proc. IEEE Int'l Conf. Intelligent Systems, Sept. 2008.
[26] Y.A. Reznik, "On Tries, Suffix Trees, and Universal Variable-Length-to-Block Codes," Proc. IEEE Int'l Symp. Information Theory, p. 123, 2002.
[27] C. Sheng, W. Hsu, and M.-L. Lee, "Mining Dense Periodic Patterns in Time Series Data," Proc. 22nd IEEE Int'l Conf. Data Eng., p. 115, 2005.
[28] C. Sheng, W. Hsu, and M.-L. Lee, "Efficient Mining of Dense Periodic Patterns in Time Series," technical report, Nat'l Univ. of Singapore, 2005.
[29] N. Välimäki, W. Gerlach, K. Dixit, and V. Mäkinen, "Compressed Suffix Tree—A Basis for Genome-Scale Sequence Analysis," Bioinformatics, vol. 23, pp. 629-630, 2007.
[30] A. Al-Rawi, A. Lansari, and F. Bouslama, "A New Non-Recursive Algorithm for Binary Search Tree Traversal," Proc. IEEE Int'l Conf. Electronics, Circuits and Systems (ICECS), vol. 2, pp. 770-773, Dec. 2003.
[31] Y. Tian, S. Tata, R.A. Hankins, and J.M. Patel, "Practical Methods for Constructing Suffix Trees," VLDB J., vol. 14, no. 3, pp. 281-299, Sept. 2005.
[32] E. Ukkonen, "Online Construction of Suffix Trees," Algorithmica, vol. 14, no. 3, pp. 249-260, 1995.
[33] A. Weigend and N. Gershenfeld, Time Series Prediction: Forecasting the Future and Understanding the Past. Addison-Wesley, 1994.
[34] J. Yang, W. Wang, and P. Yu, "InfoMiner+: Mining Partial Periodic Patterns with Gap Penalties," Proc. Second IEEE Int'l Conf. Data Mining, Dec. 2002.

Index Terms:
Time series, periodicity detection, suffix tree, symbol periodicity, segment periodicity, sequence periodicity, noise resilient.
Citation:
Faraz Rasheed, Mohammed Alshalalfa, Reda Alhajj, "Efficient Periodicity Mining in Time Series Databases Using Suffix Trees," IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 1, pp. 79-94, Jan. 2011, doi:10.1109/TKDE.2010.76
Usage of this product signifies your acceptance of the Terms of Use.