This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
DEMON: Mining and Monitoring Evolving Data
January/February 2001 (vol. 13 no. 1)
pp. 50-63

Abstract—Data mining algorithms have been the focus of much research recently. In practice, the input data to a data mining process resides in a large data warehouse whose data is kept up-to-date through periodic or occasional addition and deletion of blocks of data. Most data mining algorithms have either assumed that the input data is static, or have been designed for arbitrary insertions and deletions of data records. In this paper, we consider a dynamic environment that evolves through systematic addition or deletion of blocks of data. We introduce a new dimension, called the data span dimension, which allows user-defined selections of a temporal subset of the database. Taking this new degree of freedom into account, we describe efficient model maintenance algorithms for frequent itemsets and clusters. We then describe a generic algorithm that takes any traditional incremental model maintenance algorithm and transforms it into an algorithm that allows restrictions on the data span dimension. We also develop an algorithm for automatically discovering a specific class of interesting block selection sequences. In a detailed experimental study, we examine the validity and performance of our ideas on synthetic and real datasets.

[1] R. Aggrawal et al., "Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications," Proc. ACM SIGMOD Int'l Conf. Management of Data, ACM Press, 1998, pp. 94-105.
[2] R. Agrawal, H. Manilla, R. Srikant, H. Toivonen, and A.I. Verkami, “Fast Discovery of Association Rules,” Advances in Knowledge Discovery and Data Mining, U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, eds., pp. 307-328, 1996.
[3] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. 1994 Int'l Conf. Very Large Data Bases, pp. 487-499, Sept. 1994.
[4] S. Chaudhuri and U. Dayal, “An Overview of Data Warehousing and OLAP Technology,” SIGMOD Record, vol. 26, no. 1, Mar. 1997.
[5] D. Cheung, J. Han, V. Ng, and C.Y. Wong, Maintenance of Discovered Association Rules in Large Databases: An Incremental Updating Technique Proc. 1996 Int'l Conf. Data Eng., pp. 106-114, Feb. 1996.
[6] D. Cheung, S. Lee, and B. Kao, “A General Incremental Technique for Maintaining Discovered Association Rules,” Proc. Fifth Database Systems for Advances Applications (DASFAA) Conf., Apr. 1997.
[7] D. Cheung, T. Vincent, and W. Benjamin, “Maintenance of Discovered Knowledge: A Case in Multi-Level Association Rules,” Proc. Second Int'l Conf. Knowledge Discovery in Databases, Aug. 1996.
[8] R. Duda and P. Hart, Pattern Classification and Scene Analysis. Wiley, 1973.
[9] B. Dunkel and N. Soparkar, “Data Organization for Efficient Mining,” Proc. 15th Int'l Conf. Data Eng., pp. 522–529, Mar. 1999.
[10] M. Ester, H.-P. Kriegel, J. Sander, M. Wimmer, and X. Xu, “Incremental Clustering for Mining in a Data Warehousing Environment,” Proc. 24th Int'l Conf. Very Large Data Bases (VLDB), 1998.
[11] M. Ester, H.-P. Kriegel, and X. Xu, “A Database Interface for Clustering in Large Spatial Databases,” Proc. First Int'l Conf. Knowledge Discovery in Databases and Data Mining, Aug. 1995.
[12] R. Feldman, Y. Aumann, A. Amir, and H. Mannila, “Efficient Algorithms for Discovering Frequent Sets in Incremental Databases,” Proc. Workshop Research Issues on Data Mining and Knowledge Discovery, 1997.
[13] Advances in Knowledge Discovery and Data Mining. U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, eds., AAAI/MIT Press, 1996.
[14] V. Ganti, J. Gehrke, R. Ramakrishnan, and W.-Y. Loh, “A Framework for Measuring Changes in Data Characteristics,” Proc. 18th Symp. Principles of Database Systems, 1999.
[15] J. Gehrke, V. Ganti, R. Ramakrishnan, and W.-Y. Loh, “BOAT—Optimistic Decision Tree Construction,” Proc. ACM SIGMOD Int'l Conf. Management of Data, June 1999.
[16] S. Guha, R. Rastogi, and K. Shim, CURE: An Efficient Clustering Algorithm for Large Databases Proc. ACM SIGMOD, pp. 73-84, June 1998.
[17] H. Gupta, “Selection of Views to Materialize in a Data Warehouse,” Proc. Int'l Conf. Database Theory, Jan. 1997.
[18] A.K. Jain and R.C. Dubes, Algorithms for Clustering Data. Englewood Cliffs, N.J.: Prentice Hall, 1988.
[19] K. Fukunaga, Introduction to Statistical Pattern Recognition, second edition. Academic Press, 1990.
[20] A. Mueller, “Fast Sequential and Parallel Algorithms for Association Rule Mining: A Comparison,” Technical Report CS-TR-3515, Univ. of Maryland, College Park, Aug. 1995.
[21] V. Pudi and J. Haritsa, “Incremental Mining of Association Rules,” technical report, DSL, Indian Inst. of Science, Bangalore, 2000.
[22] S. Ramaswamy, S. Mahajan, and A. Silbershatz, “On the Discovery of Interesting Patterns in Association Rules,” Proc. 24th Int'l Conf. Very Large Databases, pp. 368–379, Aug. 1998.
[23] G. Sheikholeslami, S. Chatterjee, and A. Zhang, WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases Proc. Very Large Date Bases Conf., pp. 428-439, Aug. 1998.
[24] S. Sarawagi, S. Thomas, and R. Agrawal, “Integrating Association Rule Mining with Databases: Alternatives and Implications,” ACM SIGMOD Int'l Conf. Management of Data, June 1998.
[25] S. Thomas, S. Bodagala, K. Alsabti, and S. Ranka, “An Efficient Algorithm for the Incremental Updation of Association Rules in Large Databases,” Proc. Third Int'l Conf. Knowledge Discovery in Databases, 1997.
[26] J. Mogul, T. Kroeger, and C. Maltazhn, “Digital's Web Proxy Traces,” ftp://ftp.digital.com/pub/DEC/traces/proxy/webtraces.html.
[27] P.E. Utgoff, “ID5: An Incremental ID3,” Proc. Fifth Int'l Conf. Machine Learning, pp. 107–120, 1988.
[28] P. Willett, “Recent Trends in Hierarchical Document Clustering: A Critical Review,” Information Processing and Management, vol. 24, no. 5, pp. 577–597, 1988.
[29] M.J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li, “New Algorithms for Fast Discovery of Association Rules,” Proc. Third Int'l Conf. Knowledge Discovery in Databases and Data Mining, 1997.
[30] T. Zhang, R. Ramakrishnan, and M. Livny, "Birch: An Efficient Data Clustering Method for Very Large Databases," Proc. ACM SIGMOD Int'l Conf. Management of Data, ACM Press, 1996, pp. 103-114.

Index Terms:
Data Mining, dynamic databases, evolving data, trends.
Citation:
Venkatesh Ganti, Johannes Gehrke, Raghu Ramakrishnan, "DEMON: Mining and Monitoring Evolving Data," IEEE Transactions on Knowledge and Data Engineering, vol. 13, no. 1, pp. 50-63, Jan.-Feb. 2001, doi:10.1109/69.908980
Usage of this product signifies your acceptance of the Terms of Use.