This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Pincer-Search: An Efficient Algorithm for Discovering the Maximum Frequent Set
May/June 2002 (vol. 14 no. 3)
pp. 553-566

Abstract—Discovering frequent itemsets is a key problem in important data mining applications, such as the discovery of association rules, strong rules, episodes, and minimal keys. Typical algorithms for solving this problem operate in a bottom-up, breadth-first search direction. The computation starts from frequent 1-itemsets (the minimum length frequent itemsets) and continues until all maximal (length) frequent itemsets are found. During the execution, every frequent itemset is explicitly considered. Such algorithms perform well when all maximal frequent itemsets are short. However, performance drastically deteriorates when some of the maximal frequent itemsets are long. We present a new algorithm which combines both the bottom-up and the top-down searches. The primary search direction is still bottom-up, but a restricted search is also conducted in the top-down direction. This search is used only for maintaining and updating a new data structure, the maximum frequent candidate set. It is used to prune early candidates that would be normally encountered in the bottom-up search. A very important characteristic of the algorithm is that it does not require explicit examination of every frequent itemset. Therefore, the algorithm performs well even when some maximal frequent itemsets are long. As its output, the algorithm produces the maximum frequent set, i.e., the set containing all maximal frequent itemsets, thus specifying immediately all frequent itemsets. We evaluate the performance of the algorithm using well-known synthetic benchmark databases, real-life census, and stock market databases. The improvement in performance can be up to several orders of magnitude, compared to the best previous algorithms.

[1] R. Agrawal, A. Arning, T. Bollinger, M. Mehta, J. Shafer, and R. Srikant., “The Quest Data Mining System,” Proc. Second ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, Aug. 1996.
[2] R. Agrawal, T. Imielinski, and A. Swami, “Mining Association Rules Between Sets of Items in Large Databases,” Proc. 1993 ACM-SIGMOD Int'l Conf. Management of Data, pp. 207-216, May 1993.
[3] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. 1994 Int'l Conf. Very Large Data Bases, pp. 487-499, Sept. 1994.
[4] R. Agrawal and J.C. Shafer, Parallel Mining of Association Rules: Design, Implementation, and Experience IEEE Trans. Knowledge and Data Eng., pp. 487-499, Dec. 1996.
[5] R. Agarwal, C. Aggarwal, and V.V.V. Prassad, A Tree Projection Agorithm for Generation of Frequent Itemsets J. Parallel and Distributed Computing, 2000.
[6] R. Bayardo, “Brute-Force Mining of High-Confidence Classification Rules,” Proc. Third ACM SIGKKD Int'l Conf. Knowledge Discovery and Data Mining, Aug. 1997.
[7] R.J. Bayardo, “Efficiently Mining Long Patterns From Databases,” ACM SIGMOD Conf. Management of Data, June 1998.
[8] S. Brin, R. Motwani, J. Ullman, and S. Tsur, “Dynamic Itemset Counting and Implication Rules for Market Basket Data,” ACM SIGMOD Conf. Management of Data, May 1997.
[9] U. Fayyad et al., eds., Advances in Knowledge Discovery and Data Mining, MIT Press, Cambridge, Mass., 1996.
[10] D. Gunopulos, R. Khardon, H. Mannila, and H. Toivonen, “Data Mining, Hypergraph Transversals, and Machine Learning,” Proc. 16th ACM Symp. Principles of Database Systems, May 1997.
[11] D. Gunopulos, H. Mannila, and S. Saluja, “Discovering All the Most Specific Sentences by Randomized Algorithms,” Int'l Conf. Database Theory, Jan. 1997.
[12] J. Han and Y. Fu, “Discovery of Multiple-Level Association Rules from Large Databases,” Proc. 1995 Int'l Conf. Very Large Data Bases, pp. 420-431, Sept. 1995.
[13] J. Han, J. Pei, and Y. Yin, Mining Frequent Patterns without Candidate Generation Proc. ACM SIGMOD Conf. Management of Data (SIGMOD '00), pp. 1-12, 2000.
[14] E.-H. Han, G. Karypis, and V. Kumar, “Scalable Parallel Data Mining for Association Rules,” ACM SIGMOD Conf. Management of Data, May 1997.
[15] K. Hätönen, M. Klemettinen, P. Mannila, H. Ronkainen, and H. Toivonen, “Knowledge Discovery from Telecommunication Network Alarm Databases,” Proc. 12th Int'l Conf. Data Eng., pp. 115-122, Feb. 1996.
[16] M. Houtsma and A. Swami, “Set-Oriented Mining of Association Rules,” Research Report RJ 9567, IBM Almaden Research Center, Oct. 1993.
[17] D.-I. Lin and Z.M. Kedem, “Pincer-Search: A New Algorithm for Discovering the Maximum Frequent Set,” Sixth Int'l Conf. Extending Database Technology, Mar. 1998.
[18] T. Mitchell, “Generalization as Search,” Artificial Intelligence, vol. 18, 1982.
[19] A. Mueller, “Fast Sequential and Parallel Algorithms for Association Rule Mining: A Comparison,” Technical Report CS-TR-3515, Univ. of Maryland, College Park, Aug. 1995.
[20] H. Mannila and H. Toivonen, “Discovering Frequent Episodes in Sequences,” Proc. First ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), Aug. 1995.
[21] H. Mannila and H. Toivonen, “Levelwise Search and Borders of Theories in Knowledge Discovery,” Technical Report TR C-1997-8, Dept. of Computer Science, Univ. of Helsinki, Jan. 1997.
[22] H. Mannila, H. Toivonen, and A. Verkamo, “Improved Methods for Finding Association Rules,” Proc. AAAI Workshop Knowledge Discovery, July 1994.
[23] The TAQ Database Release 1.0 in CD-ROM, New York Stock Exchange, Inc., June 1997.
[24] B. Ozden, S. Ramaswamy, and A. Silberschatz, Cyclic Association Rules Proc. 14th Int'l Conf. Data Eng., Apr. 1998.
[25] G. Piatetsky-Shapiro and W.J. Frawley, Knowledge Discovery in Databases. AAAI/MIT Press, 1991.
[26] J.S. Park, M.S. Chen, and P.S. Yu, “An Effective Hash-Based Algorithm for Mining Association Rules,” Proc. 1995 ACM-SIGMOD Int'l Conf. Management of Data, pp. 175-186, May 1995.
[27] R. Srikant and R. Agrawal, “Mining Generalized Association Rules,” Proc. 1995 Int'l Conf. Very Large Data Bases, pp. 407-419, Sept. 1995.
[28] R. Srikant and R. Agrawal, “Mining Sequential Patterns: Generalizations and Performance Improvements,” Proc. Fifth Int'l Conf. Extending Database Technology (EDBT), pp. 3-17, 1996.
[29] A. Savasere, E. Omiecinski, and S. Navathe, “An Efficient Algorithm for Mining Association Rules in Large Databases,” Proc. 1995 Int'l Conf. Very Large Data Bases, pp. 432-443, Sept. 1995.
[30] H. Toivonen, “Sampling Large Databases for Association Rules,” Proc. 1996 Int'l Conf. Very Large Data Bases, pp. 134-145, Sept. 1996.
[31] M.J. Zaki, M. Ogihara, S. Parthasarathy, and W. Li, “Parallel Data Mining for Association Rules on Shared-Memory Multi-Processors,” Technical Report 618, Computer Science Dept., The Univ. of Rochester, May 1996.
[32] M. Zaki, S. Parthasarathy, M. Ogihara, and W. Li, “New Algorithms for Fast Discovery of Association Rules,” Proc. Third ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), Aug. 1997.

Index Terms:
Data mining, knowledge discovery, association rule, maximum frequent set, Pincer Search, maximum frequent candidate set
Citation:
D.-I. Lin, Z.M. Kedem, "Pincer-Search: An Efficient Algorithm for Discovering the Maximum Frequent Set," IEEE Transactions on Knowledge and Data Engineering, vol. 14, no. 3, pp. 553-566, May-June 2002, doi:10.1109/TKDE.2002.1000342
Usage of this product signifies your acceptance of the Terms of Use.