The Community for Technology Leaders
RSS Icon
Issue No.07 - July (2008 vol.20)
pp: 911-923
In data mining and knowledge discovery, pattern discovery extracts previously unknown regularities in the data and is a useful tool for categorical data analysis. However, the number of patterns discovered is often overwhelming. It is difficult and time-consuming to 1) interpret the discovered patterns and 2) use them to further analyze the data set. To overcome these problems, this paper proposes a new method that clusters patterns and their associated data simultaneously. When patterns are clustered, the data containing the patterns are also clustered; and the relation between patterns and data is made explicit. Such an explicit relation allows the user on the one hand to further analyze each pattern cluster via its associated data cluster, and on the other hand to interpret why a data cluster is formed via its corresponding pattern cluster. Since the effectiveness of clustering mainly depends on the distance measure, several distance measures between patterns and their associated data are proposed. Their relationships to the existing common ones are discussed. Once pattern clusters and their associated data clusters are obtained, each of them can be further analyzed individually. To evaluate the effectiveness of the proposed approach, experimental results on synthetic and real data are reported.
Clustering, classification, and association rules, Similarity measures, Data mining
Andrew K.C. Wong, Gary C.L. Li, "Simultaneous Pattern and Data Clustering for Pattern Cluster Analysis", IEEE Transactions on Knowledge & Data Engineering, vol.20, no. 7, pp. 911-923, July 2008, doi:10.1109/TKDE.2008.38
[1] F. Mills, Statistical Methods. Pitman, 1955.
[2] A.K.C. Wong and Y. Wang, “High Order Pattern Discovery from Discrete-Valued Data,” IEEE Trans. Knowledge and Data Eng., vol. 9, no. 6, pp. 877-893, Nov./Dec. 1997.
[3] Y. Wang and A.K.C. Wong, “From Association to Classification: Inference Using Weight of Evidence,” IEEE Trans. Knowledge and Data Eng., vol. 15, no. 3, pp. 764-767, May/June 2003.
[4] H. Toivonen, M. Klemetinen, P. Ronkaninen, K. Hatonen, and H. Mannila, “Pruning and Grouping Discovered Association Rules,” Proc. MLnet Workshop Statistics, Machine Learning, and Discovery in Databases, pp. 47-52, 1995.
[5] G.K. Gupta, A. Strehi, and J. Ghosh, “Distance Based Clustering of Association Rules,” Proc. Int'l Conf. Artificial Neural Networks in Eng. (ANNIE '99), vol. 9, pp. 759-764, 1999.
[6] B. Liu, W. Hsu, and Y. Ma, “Pruning and Summarizing the Discovered Associations,” Proc. Fifth ACM Int'l Conf. Knowledge Discovery and Data Mining (KDD '99), pp. 125-134, 1999.
[7] J. Han, Data Mining: Concepts and Techniques. Morgan Kaufmann, 2001.
[8] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. 20th Int'l Conf. Very Large Data Bases (VLDB '94), pp. 487-499, 1994.
[9] J. Hipp, U. Güntzer, and G. Nakhaeizadeh, “Algorithms for Association Rule Mining: General Survey and Comparison,” ACM SIGKDD Explorations Newsletter, vol. 2, no. 1, pp. 58-64, 2000.
[10] S. Brin, R. Motwani, and R. Silverstein, “Beyond Market Basket: Generalizing Association Rules to Correlations,” Proc. ACM SIGMOD '97, pp. 265-276, 1997.
[11] A. Silberschatz and A. Tuzhilin, “What Makes Patterns Interesting in Knowledge Discovery Systems,” IEEE Trans. Knowledge and Data Eng., vol. 8, no. 6, pp. 970-974, Dec. 1996.
[12] R. Srikant, Q. Vu, and R. Agrawal, “Mining Association Rules with Item Constraints,” Proc. Third Int'l Conf. Knowledge Discovery and Data Mining (KDD '97), pp. 67-73, 1997.
[13] R. Bayardo, R. Agrawal, and D. Gunopulos, “Constraint-Based Rule Mining in Large Dense Databases,” Proc. 15th IEEE Int'l Conf. Data Eng. (ICDE '99), pp. 188-197, 1999.
[14] N. Wrigley, Categorical Data Analysis for Geographers and Environmental Scientists. Longman, 1985.
[15] D.R. Cox and E.J.A. Snell, “General Definition of Residuals,” J.Royal Statistical Soc. B, vol. 30, pp. 248-265, 1968.
[16] S.J. Haberman, “The Analysis of Residuals in Cross-Classified Tables,” Biometrics, vol. 29, pp. 205-220, 1973.
[17] P.M. Murph and D.W. Aha,, UCI Repository of Machine Learning Databases, Dept. Information and Computer Science, Univ. of California, Irvine, 1987.
[18] S. Chawla and J. Davis, “On Local Pruning of Association Rules Using Directed Hypergraphs,” Technical Report 537, School of Information Technologies, Univ. of Sydney, 2003.
[19] A.K.C. Wong and T.S. Liu, “Typicality, Diversity and Feature Patterns of an Ensemble,” IEEE Trans. Computers, vol. 24, no. 2, pp.158-181, Feb. 1975.
[20] A.K.C. Wong, T.S. Liu, and C.C. Wang, “Statistical Analysis of Residue Variability in Cytochrome C,” J. Molecular Biology, vol. 102, pp. 287-295, 1976.
[21] D. Chiu and A.K.C. Wong, “Multiple Pattern Associations for Interpreting Structural and Functional Characteristics of Biomolecules,” Information Science, vol. 167, nos. 1-4, pp. 23-29, 2004.
[22] Y. Wang and A.K.C Wong, Discover*e. Pattern Discovery Technologies, http:/, 2008.
[23] R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification. Wiley, 2000.
[24] A.K.C. Wong, D.K.Y. Chiu, and W. Huang, “A Discrete-Valued Clustering Algorithm with Applications to Bimolecular Data,” Information Sciences, vol. 139, pp. 97-112, 2002.
[25] W.H. Au, K.C.C. Chan, A.K.C. Wong, and Y. Wang, “Attribute Clustering for Grouping, Selection and Classification of Gene Expression Data,” ACM/IEEE Trans. Computational Biology and Bioinformatics, vol. 2, no. 2, pp. 83-101, Apr.-June 2005.
[26] A.K.C. Wong and Y. Wang, “Pattern Discovery: A Data Driven Approach to Decision Support,” IEEE Trans. Systems, Man, Cybernetics Part C, vol. 33, no. 1, pp. 114-124, 2003.
[27] T. Chau and A.K.C. Wong, “Pattern Discovery by Residual Analysis and Recursive Partitioning,” IEEE Trans. Knowledge and Data Eng., vol. 11, no. 6, pp. 833-852, Nov./Dec. 1999.
23 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool