The Community for Technology Leaders
RSS Icon
Issue No.06 - June (2008 vol.20)
pp: 784-795
Generally speaking, to implement Apriori-based association rule mining in hardware, one has to load candidate itemsets and a database into the hardware. Too many candidate itemsets and a large database would create a performance bottleneck. In this paper, we propose a HAsh-based and PiPelIned architecture (abbreviated as HAPPI) for hardware-enhanced association rule mining. We apply the pipeline methodology in the HAPPI architecture to compare itemsets with the database and collect useful information for reducing the number of candidate itemsets and items in the database simultaneously. When the database is fed into the hardware, candidate itemsets are compared with the items in the database to find frequent itemsets. At the same time, trimming information is collected from each transaction. Therefore, we can effectively reduce the frequency of loading the database into the hardware. As such, HAPPI solves the bottleneck problem in Apriori-based hardware schemes. We also derive some properties to investigate the performance of this hardware implementation. As shown by the experiment results, HAPPI significantly outperforms the previous hardware approach in terms of execution cycles.
data mining, Association Rules, hardware-enhanced mining
Ying-Hsiang Wen, Jen-Wei Huang, Ming-Syan Chen, "Hardware-Enhanced Association Rule Mining with Hashing and Pipelining", IEEE Transactions on Knowledge & Data Engineering, vol.20, no. 6, pp. 784-795, June 2008, doi:10.1109/TKDE.2008.39
[1] R. Agarwal, C. Aggarwal, and V. Prasad, “A Tree Projection Algorithm for Generation of Frequent Itemsets,” J. Parallel and Distributed Computing, 2000.
[2] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. 20th Int'l Conf. Very Large Databases (VLDB), 1994.
[3] Z.K. Baker and V.K. Prasanna, “Efficient Hardware Data Mining with the Apriori Algorithm on FPGAS,” Proc. 13th Ann. IEEE Symp. Field-Programmable Custom Computing Machines (FCCM), 2005.
[4] Z.K. Baker and V.K. Prasanna, “An Architecture for Efficient Hardware Data Mining Using Reconfigurable Computing Systems,” Proc. 14th Ann. IEEE Symp. Field-Programmable Custom Computing Machines (FCCM '06), pp. 67-75, Apr. 2006.
[5] C. Besemann and A. Denton, “Integration of Profile Hidden Markov Model Output into Association Rule Mining,” Proc. 11th ACM SIGKDD Int'l Conf. Knowledge Discovery in Data Mining (KDD '05), pp. 538-543, 2005.
[6] C.W. Chen, J. Luo, and K.J. Parker, “Image Segmentation via Adaptive K-Mean Clustering and Knowledge-Based Morphological Operations with Biomedical Applications,” IEEE Trans. Image Processing, vol. 7, no. 12, pp. 1673-1683, 1998.
[7] S.M. Chung and C. Luo, “Parallel Mining of Maximal Frequent Itemsets from Databases,” Proc. 15th IEEE Int'l Conf. Tools with Artificial Intelligence (ICTAI), 2003.
[8] S. Cong, J. Han, J. Hoeflinger, and D. Padua, “A Sampling-Based Framework for Parallel Data Mining,” Proc. 10th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP '05), June 2005.
[9] M. Estlick, M. Leeser, J. Szymanski, and J. Theiler, “Algorithmic Transformations in the Implementation of K-Means Clustering on Reconfigurable Hardware,” Proc. Ninth Ann. IEEE Symp. Field-Programmable Custom Computing Machines (FCCM), 2001.
[10] M. Gokhale, J. Frigo, K. McCabe, J. Theiler, C. Wolinski, and D. Lavenier, “Experience with a Hybrid Processor: K-Means Clustering,” J. Supercomputing, pp. 131-148, 2003.
[11] J. Han and M. Kamber, Data Mining: Concepts and Techniques. Morgan Kaufmann, 2001.
[12] J. Han, J. Pei, and Y. Yin, “Mining Frequent Patterns without Candidate Generation,” Proc. ACM SIGMOD '00, pp. 1-12, May 2000.
[13] H. Kung and C. Leiserson, “Systolic Arrays for VLSI,” Proc. Sparse Matrix, 1976.
[14] N. Ling and M. Bayoumi, Specification and Verification of Systolic Arrays. World Scientific Publishing, 1999.
[15] W.-C. Liu, K.-H. Liu, and M.-S. Chen, “High Performance Data Stream Processing on a Novel Hardware Enhanced Framework,” Proc. 10th Pacific-Asia Conf. Knowledge Discovery and Data Mining (PAKDD '06), Apr. 2006.
[16] J.S. Park, M.-S. Chen, and P.S. Yu, “An Effective Hash Based Algorithm for Mining Association Rules,” Proc. ACM SIGMOD '95, pp. 175-186, May 1995.
[17] J.S. Park, M.-S. Chen, and P.S. Yu, “Using a Hash-Based Method with Transaction Trimming for Mining Association Rules,” IEEE Trans. Knowledge and Data Eng., vol. 9, no. 5, pp. 813-825, Sept./Oct. 1997.
[18] A. Savasere, E. Omiecinski, and S. Navathe, “An Efficient Algorithm for Mining Association Rules in Large Databases,” Proc. 21st Int'l Conf. Very Large Databases (VLDB '95), pp. 432-444, Sept. 1995.
[19] H. Toivonen, “Sampling Large Databases for Association Rules,” Proc. 22nd Int'l Conf. Very Large Databases (VLDB '96), pp. 134-145, 1996.
[20] C. Wolinski, M. Gokhale, and K. McCabe, “A Reconfigurable Computing Fabric,” Proc. Int'l Conf. Eng. of Reconfigurable Systems and Algorithms (ERSA), 2004.
5 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool