The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.06 - June (2008 vol.20)
pp: 721-735
ABSTRACT
Organizations and firms are capturing increasingly more data about their customers, suppliers, competitors, and business environment. Most of this data is multi-attribute (multi-dimensional) and temporal in nature. Data mining and business intelligence techniques are often used to discover patterns in such data; however, mining temporal relationships typically is a complex task. We propose a new data analysis and visualization technique for representing trends in multi-attribute temporal data using a clustering-based approach. We introduce C-TREND, a system that implements the temporal cluster graph construct, which maps multi-attribute temporal data to a two-dimensional directed graph that identifies trends in dominant data types over time. In this paper, we present our temporal clustering-based technique, discuss its algorithmic implementation and performance, demonstrate applications of the technique by analyzing data on wireless networking technologies and baseball batting statistics, and introduce a set of metrics for further analysis of discovered trends.
INDEX TERMS
Interactive data exploration and discovery, Data and knowledge visualization, Data mining, Clustering, classification, and association rules
CITATION
Gediminas Adomavicius, Jesse Bockstedt, "C-TREND: Temporal Cluster Graphs for Identifying and Visualizing Trends in Multiattribute Transactional Data", IEEE Transactions on Knowledge & Data Engineering, vol.20, no. 6, pp. 721-735, June 2008, doi:10.1109/TKDE.2008.31
REFERENCES
[1] J. Abello and J. Korn, “MGV: A System of Visualizing Massive Multi-Digraphs,” IEEE Trans. Visualization and Computer Graphics, vol. 8, no. 1, pp. 21-38, Jan.-Mar. 2001.
[2] R. Agrawal, K.I. Lin, H.S. Sawhney, and K. Shim, “Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases,” Proc. 21st Int'l Conf. Very Large Data Bases (VLDB '95), pp. 490-501, 1995.
[3] M.S. Aldenderfer and R.K. Blashfield, Cluster Analysis. Sage Publications, 1984.
[4] C.M. Antunes and A.L. Oliveira, “Temporal Data Mining: An Overview,” Proc. ACM SIGKDD Workshop Data Mining, pp. 1-13, Aug. 2001.
[5] C. Apte, B. Liu, E. Pednault, and P. Smyth, “Business Applications of Data Mining,” Comm. ACM, vol. 45, no. 8, pp. 49-53, 2002.
[6] G.C. Battista, P. Eades, R. Tamassia, and I.G. Tollis, Graph Drawing. Prentice Hall, 1999.
[7] B. Becker, R. Kohavi, and D. Sommerfield, “Visualizing the Simple Bayesian Classifier,” Proc. ACM SIGKDD Workshop Issues on the Integration of Data Mining and Data Visualization, 1997.
[8] B. Bederson, “Pad++: Advances in Multiscale Interfaces,” Proc. Conf. Human Factors in Computing Systems (CHI '94), p. 315, 1994.
[9] D.J. Berndt and J. Clifford, “Finding Patterns in Time Series: A Dynamic Programming Approach,” Advances in Knowledge Discovery and Data Mining, pp. 229-248, 1995.
[10] J. Bertin, Semiology of Graphics: Diagrams, Networks, Maps, W.J.Berg, translator, Univ. of Wisconsin Press, 1983.
[11] C.G. Beshers and S.K. Feiner, “Visualizing n-Dimensional Virtual Worlds within n-Vision,” Computer Graphics, vol. 24, no. 2, pp. 37-38, 1990.
[12] C.G. Beshers and S.K. Feiner, “AutoVisual: Rule-Based Design of Interactive Multivariate Visualizations,” IEEE Computer Graphics and Applications, vol. 13, no. 4, pp. 41-49, 1993.
[13] C.G. Beshers and S.K. Feiner, “Automated Design of Data Visualizations,” Scientific Visualization—Advances and Applications, L. Rosemblum et al., eds., pp. 88-102, Academic Press, 1994.
[14] C. Bettini, S. Wang, S. Jajodia, and J.L. Lin, “Discovering Frequent Event Patterns with Multiple Granularities in Time Sequences,” IEEE Trans. Knowledge and Data Eng., vol. 10, no. 2, pp. 222-237, Mar./Apr. 1998.
[15] P. Brockwell and R. Davis, Time Series: Theory and Methods. Springer, 2001.
[16] S. Card, J. Mackinlay, and B. Schneiderman, Readings in Information Visualization. Morgan Kaufmann, 1999.
[17] C. Chen, Information Visualization and Virtual Environments. Springer, 1999.
[18] M.C. Chuah and S.F. Roth, “On the Semantics of Interactive Visualization,” Proc. IEEE Symp. Information Visualization (InfoVis '96), pp. 29-36, 1996.
[19] M.C.F. de Oliveira and H. Levkowitz, “From Visual Data Exploration to Visual Data Mining: A Survey,” IEEE Trans. Visualization and Computer Graphics, vol. 9, no. 3, pp. 378-394, July-Sept. 2003.
[20] T.G. Dietterich and R.S. Michalski, “Discovering Patterns in Sequences of Events,” Artificial Intelligence, vol. 25, no. 2, pp.187-232, 1985.
[21] R. Duda, P. Hart, and D. Stork, Pattern Classification, second ed. Wiley-Interscience, 2000.
[22] S.G. Eick and G.J. Wills, “Navigating Large Networks with Hierarchies,” Proc. IEEE Conf. Visualization (VIS '93), pp. 204-210, 1993.
[23] K. Fishkin and M.C. Stone, “Enhanced Dynamic Queries via Movable Filters,” Proc. Conf. Human Factors in Computing Systems (CHI '95), pp. 415-420, 1995.
[24] B.J. Frey and D. Dueck, “Clustering by Passing Messages between Data Points,” Science, vol. 315, no. 5814, pp. 972-976, 2007.
[25] V. Guralnik and J. Srivastava, “Event Detection from Time Series Data,” Proc. ACM SIGKDD '99, pp. 33-42, 1999.
[26] R.J. Hendley, N.S. Drew, A.M. Wood, and R. Beale, “Narcissus: Visualizing Information,” Proc. Int'l Symp. Information Visualization (InfoVis '95), pp. 90-96, 1995.
[27] W.L. Hibbard, C.R. Dryer, and B.E. Paul, “A Lattice Model of Data Display,” Proc. IEEE Conf. Visualization (VIS '94), pp. 310-317, 1994.
[28] A. Jain, M. Murty, and P. Flynn, “Data Clustering: A Review,” ACM Computing Surveys, vol. 31, no. 3, pp. 264-323, 1999.
[29] N. Jardine and R. Sibson, “The Construction of Hierarchic and Non-Hierarchic Classifications,” The Computer J., vol. 11, no. 2, pp.177-184, 1968.
[30] S.C. Johnson, “Hierarchical Clustering Schemes,” Psychometrika, vol. 32, no. 3, pp. 241-254, 1967.
[31] Y. Kakizawa, R.H. Shumway, and M. Taniguchi, “Discrimination and Clustering for Multivariate Time Series,” J. Am. Statistical Assoc., vol. 93, no. 441, pp. 328-340, 1998.
[32] E. Kandogan, “Visualizing Multi-Dimensional Clusters, Trends, and Outliers Using Star Coordinates,” Proc. ACM SIGKDD '01, pp.107-116, 2001.
[33] L. Kaufman and P. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons, 1990.
[34] D.A. Keim, “Visual Database Exploration Techniques,” Proc. ACM SIGKDD Tutorial, 1997.
[35] D.A. Keim, “Information Visualization and Visual Data Mining,” IEEE Trans. Visualization and Computer Graphics, vol. , no. 1, pp.1-8, 2002.
[36] D.A. Keim and H.P. Kriegel, “Visualization Techniques for Mining Large Databases: A Comparison,” IEEE Trans. Knowledge and Data Eng., vol. 8, no. 6, pp. 923-936, Dec. 1996.
[37] E. Keogh, “A Fast and Robust Method for Pattern Matching in Time Series Databases,” Proc. Ninth Int'l Conf. Tools with Artificial Intelligence (TAI), 1997.
[38] E. Keogh and S. Kasetty, “On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration,” Data Mining and Knowledge Discovery, vol. 7, no. 4, pp. 349-371, 2003.
[39] E. Keogh and M. Pazzani, “An Enhanced Representation of Time Series Which Allows Fast and Accurate Classification, Clustering, and Relevance Feedback,” Proc. ACM SIGKDD '98, R. Agrawal, P.Stolorz, and G. Piatetsky-Shapiro, eds., pp. 239-241, 1998.
[40] E. Keogh and P. Smyth, “A Probabilistic Approach to Fast Pattern Matching in Time Series Databases,” Proc. ACM SIGKDD, 1997.
[41] J. LeBlanc, M.O. Ward, and N. Wittels, “Exploring n-Dimensional Databases,” Proc. IEEE Conf. Visualization (VIS '90), pp. 230-237, 1990.
[42] Y. Li, X.S. Wang, and S. Jajodia, “Discovering Temporal Patterns in Multiple Granularities,” Proc. Int'l Workshop Temporal, Spatial and Spatio-Temporal Data Mining (TSDM), 2000.
[43] J.D. Mackinlay, “Automating the Design of Graphical Presentations of Relational Information,” ACM Trans. Graphics, vol. 5, no. 2, pp. 110-141, 1986.
[44] H. Mannila, H. Toivonen, and A.I. Verkamo, “Discovering Frequent Episodes in Sequences,” Proc. ACM SIGKDD '95, pp.210-215, 1995.
[45] G.W. Milligan and M.C. Cooper, “An Examination of Procedures for Determining the Number of Clusters in a Data Set,” Psychometrika, vol. 50, no. 2, pp. 159-179, 1985.
[46] N. Molinari, C. Bonaldi, and J.P. Daures, “Multiple Temporal Cluster Detection,” Biometrics, vol. 57, no. 2, pp. 577-583, 2001.
[47] T. Oates, “Identifying Distinctive Subsequences in Multivariate Time Series by Clustering,” Proc. ACM SIGKDD '99, pp. 322-326, 1999.
[48] B. Padmanabhan and A. Tzuhilin, “Pattern Discovery in Temporal Databases: A Temporal Logic Approach,” Proc. ACM SIGKDD, 1996.
[49] J. Pei, J. Han, B. Mortazavi-Asl, J. Wang, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu, “Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach,” IEEE Trans. Knowledge and Data Eng., vol. 16, no. 10, pp. 1-17, Oct. 2004.
[50] R.J. Povinelli, “Identifying Temporal Patterns for Characterization and Prediction of Financial Time Series Events,” Proc. Int'l Workshop Temporal, Spatial and Spatio-Temporal Data Mining (TSDM '00), pp. 46-61, 2000.
[51] R.J. Povinelli and X. Feng, “A New Temporal Pattern Identification Method for Characterization and Prediction of Complex Time Series Events,” IEEE Trans. Knowledge and Data Eng., vol. 15, no. 2, pp. 339-352, Mar./Apr. 2003.
[52] D. Pyle, Data Preparation for Data Mining. Morgan Kaufman, 1999.
[53] R. Rao and S.K. Card, “The Table Lens: Merging Graphical and Symbolic Representation in an Interactive Focus-Context Visualization for Tabular Information,” Proc. Conf. Human Factors in Computing Systems (CHI '94), pp. 318-322, 1994.
[54] J. Roddick, K. Hornsby, and M. Spiliopoulou, “An Updated Bibliography of Temporal, Spatial, and Spatio-Temporal Data Mining Research,” Proc. Int'l Workshop Temporal, Spatial and Spatio-Temporal Data Mining (TSDM), 2000.
[55] J. Roddick and M. Spiliopoulou, “A Survey of Temporal Knowledge Discovery Paradigms and Methods,” IEEE Trans. Knowledge and Data Eng., vol. 14, no. 4, pp. 750-767, July/Aug. 2002.
[56] S.F. Roth and J. Mattis, “Data Characterization for Intelligent Graphics Presentations,” Proc. Conf. Human Factors in Computing Systems (CHI '90), pp. 193-200, 1990.
[57] B. Schneiderman, “Tree Visualization with Treemaps: A 2D Space-Filling Approach,” ACM Trans. Graphics, vol. 11, no. 1, pp. 92-99, 1992.
[58] B. Schneiderman, “The Eyes Have It: A Task by Data Type Taxonomy for Information Visualizations,” Proc. IEEE Symp. Visual Languages, 1996.
[59] H. Senay and E.A. Ignatius, “Knowledge-Based System for Visualization Design,” IEEE Computer Graphics and Applications, vol. 14, no. 6, pp. 36-47, 1994.
[60] M. Shaw, C. Subramaniam, G.W. Tan, and M. Welge, “Knowledge Management and Data Mining for Marketing,” Decision Support Systems, vol. 31, no. 1, pp. 127-137, 2001.
[61] B. Spence, Information Visualization. Pearson Education Higher Education, 2000.
[62] C.A. Sugar, “An Application of Cluster Analysis to Health Services Research: Empirically Defined Health States for Depression from the SF-12,” technical report, Dept. of Statistics, Stanford Univ., 1999.
[63] C.A. Sugar and G.M. James, “Finding the Number of Clusters in a Data Set: An Information Theoretic Approach,” J. Am. Statistical Assoc., vol. 98, pp. 750-763, 2003.
[64] D. Tang, C. Stolte, and P. Hanrahan, “Polaris: A System for Query, Analysis and Visualization of Multi-Dimensional Relational Databases,” IEEE Trans. Visualization and Computer Graphics, vol. 8, no. 1, pp. 52-65, 2002.
[65] R. Tibshirani, G. Walther, and T. Hastie, “Estimating the Number of Clusters in a Data Set via the Gap Statistic,” J. Royal Statistical Soc., vol. 63, no. 2, pp. 411-423, 2001.
[66] C. Ware, Information Visualization: Perception for Design. Morgan Kaufmann, 2000.
[67] M. Zaki, “SPADE: An Efficient Algorithm for Mining Frequent Sequences,” Machine Learning, vol. 42, no. 1-2, pp. 31-60, 2001.
14 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool