This Article 
 Bibliographic References 
 Add to: 
Compressed Hierarchical Mining of Frequent Closed Patterns from Dense Data Sets
September 2007 (vol. 19 no. 9)
pp. 1175-1187
This paper addresses the problem of finding frequent closed patterns (FCPs) from very dense datasets. We introduce two compressed hierarchical FCP mining algorithms C-Miner and B-Miner. The two algorithms compress the original mining space, hierarchically partition the whole mining task into independent subtasks and mine each subtask progressively. The two algorithms adopt different task-partitioning strategies: CMiner partitions the mining task based on Compact Matrix Division whereas B-Miner partitions the task based on Base Rows Projection. The compressed hierarchical mining algorithms enhance the mining efficiency and facilitate a progressive refinement of results. Moreover, because the subtasks can be mined independently, C-Miner and B-Miner can be readily parallelized without incurring significant communication overhead. We have implemented C-Miner and B-Miner, and our performance study on synthetic datasets and real dense microarray datasets shows their effectiveness over existing schemes. We also report experimental results on parallel versions of these two methods.

[1] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. 20th Int'l Conf. Very Large Data Bases (VLDB '94), pp. 487-499, 1994.
[2] J. Besson, C. Robardet, and J.-F. Boulicaut, “Constraint-Based Mining of Formal Concepts in Transactional Data,” Proc. Eighth Pacific-Asia Conf. Knowledge Discovery and Data Mining (PAKDD '04), pp. 615-624, 2004.
[3] G. Cong, K.L. Tan, A.K.H. Tung, and F. Pan, “Mining Frequent Closed Patterns in Microarray Data,” Proc. Fourth IEEE Int'l Conf. Data Mining (ICDM '04), pp.363-366, 2004.
[4] M. El-Hajj and O.R. Zaiane, “Parallel Association Rule Mining with Minimum Inter-Processor Communication,” Proc. 14th Int'l Workshop Database and Expert Systems Applications (DEXA '03), pp.519-523, 2003.
[5] H. Mannila, H. Toivonen, and A.I. Verkamo, “Efficient Algorithms for Discovering Association Rules,” Proc. 12th Nat'l Conf. Artificial Intelligence (AAAI '94) Workshop Knowledge Discovery in Databases (KDD '94), pp. 181-192, 1994.
[6] F. Pan, G. Cong, and A.K.H. Tung, “Carpenter: Finding Closed Patterns in Long Biological Datasets,” Proc. Ninth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '03), pp. 642-673, 2003.
[7] F. Pang, A.K.H. Tung, G. Cong, and X. Xu, “Cobbler: Combining Column and Row Enumeration for Closed Pattern Discovery,” Proc. 16th Int'l Conf. Scientific and Statistical Database Management (SSDBMS '04), pp. 21-30, 2004.
[8] J. Pei, J. Han, and R. Mao, “Closet: An Efficient Algorithm for Mining Frequent Closed Itemsets,” Proc. SIGMOD Int'l Workshop Data Mining and Knowledge Discovery, pp. 21-30, 2000.
[9] P. Shenoy, J. Haritsa, S. Sudarshan, G. Bhalotia, M. Bawa, and D. Shah, “Turbo-Charging Vertical Mining of Large Databases,” Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '00), pp.22-23, 2000.
[10] J. Wang, J. Han, and J. Pei, “CLOSET+: Searching for the Best Strategies for Mining Frequent Closed Itemsets,” Proc. Ninth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '03), pp. 236-245, 2003.
[11] R.P.G. Yudho Giri Sucahyo and A. Rudra, “Efficiently Mining Frequent Patterns from Dense Datasets Using a Cluster of Computers,” Proc. 16th Australian Joint Conf. Artificial Intelligence (AI '03), pp. 233-244, 2003.
[12] M. Zaki and C. Hsiao, “CHARM: An Efficient Algorithm for Closed Association Rule Mining,” Proc. Second SIAM Int'l Conf. Data Mining (SDM '02), 2002.
[13] M. Zaki, S. Parthasarathy, M. Ogihara, and W. Li, “New Algorithms for Fast Discovery of Association Rules,” Proc. Third Int'l Conf. Knowledge Discovery and Data Mining (KDD '97), pp. 283-286, 1997.
[14] M.J. Zaki, “Parallel and Distributed Association Mining: A Survey,” IEEE Concurrency, special issue on data mining, pp. 14-25, 1999.
[15] U. Takeaki, K. Masashi, and A. Hiroki, “LCM ver. 3: Collaboration of Array, Bitmap and Prefix Tree for Frequent Itemset Mining,” Proc. First Int'l Workshop Open Source Data Mining: Frequent Pattern Mining Implementations (OSDM '05), pp. 77-86, 2005.
[16] C. Creighton and S. Hanash, “Mining Gene Expression Database for Association Rules,” Bioinformatics, vol. 19, pp. 79-86, 2003.
[17] G. Cong, A.K.H. Tung, X. Xu, F. Pan, and J. Yang, “FARMER: Finding Interesting Rule Groups in Microarray Datasets,” Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '04), 2004.
[18] L. Ji, “Mining Localized Co-Expressed Gene Patterns from Microarray Data,” PhD dissertation, http://www.comp.nus., 2006.

Index Terms:
Frequent closed patterns, progressive, dense datasets, data mining, parallel mining
Liping Ji, Kian-Lee Tan, Anthony Tung, "Compressed Hierarchical Mining of Frequent Closed Patterns from Dense Data Sets," IEEE Transactions on Knowledge and Data Engineering, vol. 19, no. 9, pp. 1175-1187, Sept. 2007, doi:10.1109/TKDE.2007.1047
Usage of this product signifies your acceptance of the Terms of Use.