This Article 
 Bibliographic References 
 Add to: 
GHIC: A Hierarchical Pattern-Based Clustering Algorithm for Grouping Web Transactions
September 2005 (vol. 17 no. 9)
pp. 1300-1304
Grouping customer transactions into segments may help understand customers better. The marketing literature has concentrated on identifying important segmentation variables (e.g., customer loyalty) and on using cluster analysis and mixture models for segmentation. The data mining literature has provided various clustering algorithms for segmentation without focusing specifically on clustering customer transactions. Building on the notion that observable customer transactions are generated by latent behavioral traits, in this paper, we investigate using a pattern-based clustering approach to grouping customer transactions. We define an objective function that we maximize in order to achieve a good clustering of customer transactions and present an algorithm, GHIC, that groups customer transactions such that itemsets generated from each cluster, while similar to each other, are different from ones generated from others. We present experimental results from user-centric Web usage data that demonstrates that GHIC generates a highly effective clustering of transactions.

[1] E. Han, G. Karypis, V. Kumar, and B. Mobasher, “Clustering Based on Association Rule Hypergraphs,” Technical Report TR-97-019, Dept. of Computer Science, Univ. of Minnesota, Minneapolis, 1997.
[2] G. Karypis, R. Aggarwal, V. Kumar, and S. Shekhar, “Multilevel Hypergraph Partitioning: Application in VLSI Domain,” Proc. 34th Ann. ACM/IEEE Design Automation Conf., pp. 526-529, 1997.
[3] S. Kimbrough, B. Padmanabhan, and Z. Zheng, “On Usage Metric for Determining Authoritative Sites,” Proc. Workshop Information Technology & Systems (WITS 2000), pp. 43-48, 2000.
[4] H. Wang, J. Yang, W. Wang, and P.S. Yu, “Clustering by Pattern Similarity in Large Data Sets,” Proc. ACM SIGMOD 2002 Conf., pp. 394-405, 2002.
[5] K. Wang, C. Xu, and B. Liu, “Clustering Transactions Using Large Items,” Proc. ACM Int'l Conf. Information and Knowledge Management (CIKM '99), pp. 483-490, 1999.
[6] Y. Yang, X. Guan, and J. You, “CLOPE: A Fast and Effective Clustering Algorithm for Transactional Data,” Proc. ACM Int'l Conf. Knowledge Discovery and Data Mining (KDD '02), pp. 682-687, 2002.
[7] M. Steinbach, G. Karypis, and V. Kumar, “A Comparison of Document Clustering Techniques,” Proc. Int'l Conf. Knowledge Discovery and Data Mining Workshop Text Mining, 2000.

Index Terms:
Index Terms- Data mining, clustering, classification, association rules, Web mining.
Yinghui Yang, Balaji Padmanabhan, "GHIC: A Hierarchical Pattern-Based Clustering Algorithm for Grouping Web Transactions," IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 9, pp. 1300-1304, Sept. 2005, doi:10.1109/TKDE.2005.145
Usage of this product signifies your acceptance of the Terms of Use.