Issue No. 03 - May/June (2002 vol. 14)
<p><b>Abstract</b>—Discovering frequent itemsets is a key problem in important data mining applications, such as the discovery of association rules, strong rules, episodes, and minimal keys. Typical algorithms for solving this problem operate in a bottom-up, breadth-first search direction. The computation starts from frequent 1-itemsets (the minimum length frequent itemsets) and continues until all maximal (length) frequent itemsets are found. During the execution, every frequent itemset is explicitly considered. Such algorithms perform well when all maximal frequent itemsets are short. However, performance drastically deteriorates when some of the maximal frequent itemsets are long. We present a new algorithm which combines both the bottom-up and the top-down searches. The primary search direction is still bottom-up, but a restricted search is also conducted in the top-down direction. This search is used only for maintaining and updating a new data structure, the maximum frequent candidate set. It is used to prune early candidates that would be normally encountered in the bottom-up search. A very important characteristic of the algorithm is that it does not require explicit examination of every frequent itemset. Therefore, the algorithm performs well even when some maximal frequent itemsets are long. As its output, the algorithm produces the maximum frequent set, i.e., the set containing all maximal frequent itemsets, thus specifying immediately all frequent itemsets. We evaluate the performance of the algorithm using well-known synthetic benchmark databases, real-life census, and stock market databases. The improvement in performance can be up to several orders of magnitude, compared to the best previous algorithms.</p>
Data mining, knowledge discovery, association rule, maximum frequent set, Pincer Search, maximum frequent candidate set
D. Lin and Z. Kedem, "Pincer-Search: An Efficient Algorithm for Discovering the Maximum Frequent Set," in IEEE Transactions on Knowledge & Data Engineering, vol. 14, no. , pp. 553-566, 2002.