Subscribe

Issue No.05 - May (2011 vol.23)

pp: 774-787

Thiemo Gruber , University of Passau, Passau

Dominik Fisch , University of Passau, Passau

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2010.161

ABSTRACT

In this article, we provide a new technique for temporal data mining which is based on classification rules that can easily be understood by human domain experts. Basically, time series are decomposed into short segments, and short-term trends of the time series within the segments (e.g., average, slope, and curvature) are described by means of polynomial models. Then, the classifiers assess short sequences of trends in subsequent segments with their rule premises. The conclusions gradually assign an input to a class. As the classifier is a generative model of the processes from which the time series are assumed to originate, anomalies can be detected, too. Segmentation and piecewise polynomial modeling are done extremely fast in only one pass over the time series. Thus, the approach is applicable to problems with harsh timing constraints. We lay the theoretical foundations for this classifier, including a new distance measure for time series and a new technique to construct a dynamic classifier from a static one, and demonstrate its properties by means of various benchmark time series, for example, Lorenz attractor time series, energy consumption in a building, or ECG data.

INDEX TERMS

Temporal data mining, time series classification, anomaly detection, piecewise polynomial representation, piecewise probabilistic representation, generative classifier, SwiftRule.

CITATION

Thiemo Gruber, Dominik Fisch, "SwiftRule: Mining Comprehensible Classification Rules for Time Series Analysis",

*IEEE Transactions on Knowledge & Data Engineering*, vol.23, no. 5, pp. 774-787, May 2011, doi:10.1109/TKDE.2010.161REFERENCES

- [1] C. Antunes and A. Oliveira, "Temporal Data Mining: An Overview,"
Proc. Workshop Temporal Data Mining, Knowledge Discovery and Data Mining (KDD '01), pp. 1-13, 2001.- [2] J.F. Roddick and M. Spiliopoulou, "A Survey of Temporal Knowledge Discovery Paradigms and Methods,"
IEEE Trans. Knowledge and Data Eng., vol. 14, no. 4, pp. 750-767, July/Aug. 2002.- [3] S. Laxman and P.S. Sastry, "A Survey of Temporal Data Mining,"
Sadhana, vol. 31, no. 2, pp. 173-198, 2006.- [4] Q. Zhao and S. Bhowmick, "Sequential Pattern Mining: A Survey," technical report, Nanyang Technichal Univ., 2003.
- [5] A.R. Post and J.H. Harrison, "Temporal Data Mining,"
Clinics in Laboratory Medicine, vol. 28, no. 1, pp. 83-100, 2008.- [6] W. Hsu, M.L. Lee, and J. Wang,
Temporal and Spatio-Temporal Data Mining. IGI Publishing, 2007.- [7] D. Garrett, D.A. Peterson, C.W. Anderson, and M.H. Thaut, "Comparison of Linear, Nonlinear, and Feature Selection Methods for EEG Signal Classification,"
IEEE Trans. Neural System and Rehabilitation Eng., vol. 11, no. 2, pp. 141-144, June 2003.- [8] P. Maji and S. Pal, "Rough-Fuzzy C-Medoids Algorithm and Selection of Bio-Basis for Amino Acid Sequence Analysis,"
IEEE Trans. Knowledge and Data Eng., vol. 19, no. 6, pp. 859-872, June 2007.- [9] C.-H. Lee, A. Liu, and W.-S. Chen, "Pattern Discovery of Fuzzy Time Series for Financial Prediction,"
IEEE Trans. Knowledge and Data Eng., vol. 18, no. 5, pp. 613-625, May 2006.- [10] F. Fassetti, G. Greco, and G. Terracina, "Mining Loosely Structured Motifs from Biological Data,"
IEEE Trans. Knowledge and Data Eng., vol. 20, no. 11, pp. 1472-1489, Nov. 2008.- [11] Y. Kocyigit, A. Alkan, and H. Erol, "Classification of EEG Recordings by Using Fast Independent Component Analysis and Artificial Neural Network,"
J. Medical Systems, vol. 32, no. 1, pp. 17-20, 2008.- [12] G. Incerti, E. Feoli, L. Salvati, A. Brunetti, and A. Giovacchini, "Analysis of Bioclimatic Time Series and Their Neural Network-Based Classification to Characterise Drought Risk Patterns in South Italy,"
Int'l J. Biometeorology, vol. 51, no. 4, pp. 253-263, 2007.- [13] D. Yankov, E. Keogh, J. Medina, B. Chiu, and V. Zordan, "Detecting Time Series Motifs under Uniform Scaling,"
Proc. 13th Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 844-853, 2007.- [14] E. Fuchs, T. Gruber, J. Nitschke, and B. Sick, "Online Segmentation of Time Series Based on Polynomial Least-Squares Approximations,"
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 32, no. 12, pp. 2232-2245, Dec. 2010.- [15] E. Fuchs, T. Gruber, J. Nitschke, and B. Sick, "On-Line Motif Detection in Time Series with SwiftMotif,"
Pattern Recognition, vol. 42, no. 11, pp. 3015-3031, 2009.- [16] D. Fisch, B. Kühbeck, B. Sick, and S. Ovaska, "So Near and Yet So Far: New Insight into Properties of Some Well-Known Classifier Paradigms,"
Information Sciences, vol. 180, no. 18, pp. 3381-3401, 2010.- [17] D. Fisch and B. Sick, "Training of Radial Basis Function Classifiers with Resilient Propagation and Variational Bayesian Inference,"
Proc. Int'l Joint Conf. Neural Networks (IJCNN '09), pp. 838-847, 2009.- [18] C.M. Bishop,
Pattern Recognition and Machine Learning. Springer, 2006.- [19] M.W. Kadous, "Learning Comprehensible Descriptions of Multivariate Time Series,"
Proc. 16th Int'l Conf. Machine Learning (ICML), pp. 454-463, 1999.- [20] P. Geurts, "Pattern Extraction for Time Series Classification,"
Proc. Fifth European Conf. Principles of Data Mining and Knowledge Discovery (PKDD), pp. 115-127, 2001.- [21] E. Keogh and M. Pazzani, "An Enhanced Representation of Time Series Which Allows Fast and Accurate Classification, Clustering and Relevance Feedback,"
Proc. Fourth Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 239-241, 1998.- [22] D. Lemire, "A Better Alternative to Piecewise Linear Time Series Segmentation,"
Proc. 20th Nat'l Conf. Artificial Intelligence (AAAI), pp. 545-550, 2007.- [23] M. Last, Y. Klein, and A. Kandel, "Knowledge Discovery in Time Series Databases,"
IEEE Trans. Systems, Man, and Cybernetics, Part B, vol. 31, no. 1, pp. 160-169, Feb. 2001.- [24] E. Keogh, S. Chu, D. Hart, and M. Pazzani, "An Online Algorithm for Segmenting Time Series,"
Proc. IEEE Int'l Conf. Data Mining (ICDM '01), pp. 289-296, 2001.- [25] X. Liu, Z. Lin, and H. Wang, "Novel Online Methods for Time Series Segmentation,"
IEEE Trans. Knowledge and Data Eng., vol. 20, no. 12, pp. 1616-1626, Dec. 2008.- [26] G. Das, K.-I. Lin, H. Mannila, G. Renganathan, and P. Smyth, "Rule Discovery from Time Series,"
Proc. Fourth Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 16-22, 1998.- [27] E. Keogh and S. Kasetty, "On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration,"
Data Mining and Knowledge Discovery, vol. 7, no. 4, pp. 349-371, 2003.- [28] E. Keogh and J. Lin, "Clustering of Time-Series Subsequences is Meaningless: Implications for Previous and Future Research,"
Knowledge and Information Systems, vol. 8, no. 2, pp. 154-177, 2005.- [29] R.O. Duda, P.E. Hart, and D.G. Stork,
Pattern Classification, second ed. Wiley-Interscience, 2001.- [30] X. Gao and H. Guo, "Multidimensional Time Series Fuzzy Association Rules Mining,"
Comm. Int'l Information Management Assoc., vol. 6, no. 3, pp. 91-98, 2006.- [31] J.A. Ale and G.H. Rossi, "An Approach to Discovering Temporal Association Rules,"
Proc. ACM Symp. Applied Computing, vol. 1, pp. 294-300, 2000.- [32] W.-H. Au and K. Chan, "Mining Fuzzy Rules for Time Series Classification,"
Proc. IEEE Int'l Conf. Fuzzy Systems (FUZZ-IEEE '04), pp. 239-244, 2004.- [33] R. Mikut, O. Burmeister, L. Gröll, and M. Reischl, "Takagi-Sugeno-Kang Fuzzy Classifiers for a Special Class of Time-Varying Systems,"
IEEE Trans. Fuzzy Systems, vol. 16, no. 4, pp. 1038-1049, Aug. 2008.- [34] P.-C. Chang and C.-Y. Fan, "A Hybrid System Integrating a Wavelet and TSK Fuzzy Rules for Stock Price Forecasting,"
IEEE Trans. Systems, Man, and Cybernetics—Part C: Applications and Rev., vol. 38, no. 6, pp. 802-815, Nov. 2008.- [35] R. Andrews, J. Diederich, and A. Tickle, "A Survey and Critique of Techniques for Extracting Rules from Trained Artificial Neural Networks,"
Knowledge-Based Systems, vol. 8, no. 6, pp. 373-389, 1995.- [36] A. Nikiforov, S. Suslov, and V. Uvarov,
Classical Orthogonal Polynomials of a Discrete Variable. Springer-Verlag, 1991.- [37] E. Fuchs,
Schnelle Quadratmittelapproximation in Gleitenden Zeitfenstern Mit Diskreten Orthogonalen Polynomen, PhD dissertation, Univ. of Passau, 1999.- [38] A.P. Dempster, N.M. Laird, and D.B. Rubin, "Maximum Likelihood from Incomplete Data via the EM Algorithm,"
J. Royal Statistical Soc., vol. 39, no. 1, pp. 1-38, 1977.- [39] G.J. McLachlan and T. Krishnan,
The EM Algorithm and Extensions. Wiley, 1997.- [40] H. Liu and H. Motoda,
Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic Publishers, 1998.- [41]
Feature Extraction, Construction, and Selection: A Data Mining Perspective, H. Liu and H. Motoda, eds. Kluwer Academic Publishers, 1998.- [42] H. Zimmermann,
Fuzzy Set Theory and Its Applications, fourth ed. Kluwer Academic Publishers, 2001.- [43] O. Buchtala and B. Sick, "Goodness of Fit: Measures for a Fuzzy Classifier,"
Proc. First IEEE Symp. Foundations of Computational Intelligence (FOCI '07), pp. 201-207, 2007.- [44] E.N. Lorenz, "Deterministic Nonperiodic Flow,"
J. Atmospheric Sciences, vol. 20, no. 2, pp. 130-141, 1963.- [45] W. Tucker, "A Rigorous ODE Solver and Smale's 14th Problem,"
Foundations of Computational Math., vol. 2, no. 1, pp. 53-117, 2002.- [46] C. Gruber and B. Sick, "Processing Short-Term and Long-Term Information with a Combination of Hard- and Soft-Computing Techniques,"
Proc. IEEE Int'l Conf. Systems, Man and Cybernetics (SMC '03), vol. 1, pp. 126-133, 2003.- [47] L. Prechelt, "Proben1—A Set of Neural Network Benchmark Problems and Benchmarking Rules," Technical Report 21/94, Universität Karlsruhe, Fakultät für Informatik, 1994.
- [48] N. Saito, "Local Feature Extraction and Its Application Using a Library of Bases," PhD dissertation, Yale Univ., 1994.
- [49] S. Manganaris, "Supervised Classification with Temporal Data," PhD dissertation, School of Eng., Vanderbilt Univ., 1997.
- [50] P. Geurts, "CBF Problem Database," http://www.montefiore. ulg.ac.be/~geurts thesis.html, 2002.
- [51] R. Bellman, "On the Approximation of Curves by Line Segments Using Dynamic Programming,"
Comm. ACM, vol. 4, no. 6, p. 284, 1961.- [52] X. Zhang, J. Wu, X. Yang, H. Ou, and T. Lv, "A Novel Pattern Extraction Method for Time Series Classification,"
Optimization and Eng., vol. 10, no. 2, pp. 253-271, 2009.- [53] J. Shawe-Taylor and N. Cristianini,
Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge Univ. Press, 2000.- [54] C.J.C. Burges, "A Tutorial on Support Vector Machines for Pattern Recognition,"
Data Mining and Knowledge Discovery, vol. 2, no. 2, pp. 121-167, 1998.- [55] R. Olszewski, "Generalized Feature Extraction for Structural Pattern Recognition in Time-Series Data." PhD dissertation, Carnegie Mellon Univ., 2001.
- [56] R. Olszewski, "ECG Database," http://www.cs.cmu.edu/~bobski/datadata.html , 2001.
- [57]
Organic Computing, R.P. Würtz, ed. Springer, 2008.- [58] T. Horeis and B. Sick, "Collaborative Knowledge Discovery & Data Mining: From Knowledge to Experience,"
Proc. IEEE Symp. Computational Intelligence and Data Mining (CIDM '07), pp. 421-428, 2007.- [59] M. Dose, C. Gruber, A. Grunz, C. Hook, J. Kempf, G. Scharfenberg, and B. Sick, "Towards an Automated Analysis of Neuroleptic's Impact on Human Hand Motor Skills,"
Proc. IEEE Symp. Computational Intelligence in Bioinformatics and Computational Biology (CIBCB '07), pp. 494-501, 2007.- [60] H. Ding, G. Trajcevski, P. Scheuermann, X. Wang, and E. Keogh, "Querying and Mining of Time Series Data: Experimental Comparison of Representations and Distance Measures,"
Proc. Very Large Data Base Endowment Archive, vol. 1, no. 2, pp. 1542-1552, 2008. |