This Article 
 Bibliographic References 
 Add to: 
Using a Hash-Based Method with Transaction Trimming for Mining Association Rules
September-October 1997 (vol. 9 no. 5)
pp. 813-825

Abstract—In this paper, we examine the issue of mining association rules among items in a large database of sales transactions. Mining association rules means that, given a database of sales transactions, to discover all associations among items such that the presence of some items in a transaction will imply the presence of other items in the same transaction. The mining of association rules can be mapped into the problem of discovering large itemsets where a large itemset is a group of items that appear in a sufficient number of transactions. The problem of discovering large itemsets can be solved by constructing a candidate set of itemsets first, and then, identifying—within this candidate set—those itemsets that meet the large itemset requirement. Generally, this is done iteratively for each large k-itemset in increasing order of k, where a large k-itemset is a large itemset with k items. To determine large itemsets from a huge number of candidate sets in early iterations is usually the dominating factor for the overall data-mining performance. To address this issue, we develop an effective algorithm for the candidate set generation. It is a hash-based algorithm and is especially effective for the generation of candidate set for large 2-itemsets. Explicitly, the number of candidate 2-itemsets generated by the proposed algorithm is, in orders of magnitude, smaller than that by previous methods—thus resolving the performance bottleneck. Note that the generation of smaller candidate sets enables us to effectively trim the transaction database size at a much earlier stage of the iterations, thereby reducing the computational cost for later iterations significantly. The advantage of the proposed algorithm also provides us the opportunity of reducing the amount of disk I/O required. Extensive simulation study is conducted to evaluate performance of the proposed algorithm.

[1] R. Agrawal, C. Faloutsos, and A. Swami, “Efficient Similarity Search in Sequence Databases,” Proc. Fourth Int'l Conf. Foundations of Data Organization and Algorithms, pp. 69-84, Oct. 1993.
[2] R. Agrawal, S. Ghosh, T. Imielinski, B. Iyer, and A. Swami, “An Interval Classifier for Database Mining Applications,” Proc. 18th Conf. Very Large Databases, pp. 560–573, 1992.
[3] R. Agrawal, T. Imielinski, and A. Swami, “Mining Association Rules Between Sets of Items in Large Databases,” Proc. 1993 ACM-SIGMOD Int'l Conf. Management of Data, pp. 207-216, May 1993.
[4] R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. Inkeri Verkamo, "Fast Discovery of Association Rules," Advances in KDDM, U. Fayyad et al., eds., MIT/AAAI Press, 1995.
[5] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. 1994 Int'l Conf. Very Large Data Bases, pp. 487-499, Sept. 1994.
[6] R. Agrawal and R. Srikant, “Mining Sequential Patterns,” Proc. 1995 Int'l Conf. Data Eng., pp. 3-14, Mar. 1995.
[7] T.M. Anwar, H.W. Beck, and S.B. Navathe, "Knowledge Mining by Imprecise Querying: A Classification-Based Approach," Proc. Eighth Int'l Conf. Data Eng., pp. 622-630, Feb. 1992.
[8] J. Han, Y. Cai, and N. Cercone, “Knowledge Discovery in Databases: an Attribute-Oriented Approach,” Proc. 18th Conf. Very Large Databases, pp. 547–559, 1992.
[9] J. Han and Y. Fu, “Discovery of Multiple-Level Association Rules from Large Databases,” Proc. 1995 Int'l Conf. Very Large Data Bases, pp. 420-431, Sept. 1995.
[10] M. Houtsma and A. Swami, “Set-Oriented Mining of Association Rules in Relational Databases,” 11th Int'l Conf. Data Eng., 1995.
[11] E.G. Coffman Jr. and J. Eve, "File Structures Using Hashing Functions," Comm. ACM, vol. 13, no. 7, pp. 427-432 and 436, July 1970.
[12] D. Knuth, The Art of Computer Programming, vol. 3: Sorting and Searching. Addison-Wesley, 1973.
[13] H. Mannila, H. Toivonen, and A. Inkeri Verkamo, "Efficient Algorithms for Discovering Association Rules," Proc. AAAI Workshop Knowledge Discovery in Databases, pp. 181-192, July 1994.
[14] R.T. Ng and J. Han, "Efficient and Effective Clustering Methods for Spatial Data Mining," Proc. 20th Int'l Conf. Very Large Databases, Morgan Kaufmann, 1994, pp. 144-155.
[15] G. Piatetsky-Shapiro, "Discovery, Analysis and Presentation of Strong Rules," Knowledge Discovery in Databases, pp. 229-248, 1991.
[16] J.R. Quinlan,"Induction of decision trees," Machine Learning, vol. 1, pp. 81-106, 1986.
[17] J.T.-L. Wang, G.-W. Chirn, T.G. Marr, B. Shapiro, D. Shasha, and K. Zhang, "Combinatorial Pattern Discovery for Scientific Data: Some Preliminary Results," Proc. ACM SIGMOD, Minneapolis, pp. 115-125, May 1994.

Index Terms:
Data mining, association rules, hashing, performance analysis.
Jong Soo Park, Ming-Syan Chen, Philip S. Yu, "Using a Hash-Based Method with Transaction Trimming for Mining Association Rules," IEEE Transactions on Knowledge and Data Engineering, vol. 9, no. 5, pp. 813-825, Sept.-Oct. 1997, doi:10.1109/69.634757
Usage of this product signifies your acceptance of the Terms of Use.