
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
D.I. Lin, Z.M. Kedem, "PincerSearch: An Efficient Algorithm for Discovering the Maximum Frequent Set," IEEE Transactions on Knowledge and Data Engineering, vol. 14, no. 3, pp. 553566, May/June, 2002.  
BibTex  x  
@article{ 10.1109/TKDE.2002.1000342, author = {D.I. Lin and Z.M. Kedem}, title = {PincerSearch: An Efficient Algorithm for Discovering the Maximum Frequent Set}, journal ={IEEE Transactions on Knowledge and Data Engineering}, volume = {14}, number = {3}, issn = {10414347}, year = {2002}, pages = {553566}, doi = {http://doi.ieeecomputersociety.org/10.1109/TKDE.2002.1000342}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Knowledge and Data Engineering TI  PincerSearch: An Efficient Algorithm for Discovering the Maximum Frequent Set IS  3 SN  10414347 SP553 EP566 EPD  553566 A1  D.I. Lin, A1  Z.M. Kedem, PY  2002 KW  Data mining KW  knowledge discovery KW  association rule KW  maximum frequent set KW  Pincer Search KW  maximum frequent candidate set VL  14 JA  IEEE Transactions on Knowledge and Data Engineering ER   
Abstract—Discovering frequent itemsets is a key problem in important data mining applications, such as the discovery of association rules, strong rules, episodes, and minimal keys. Typical algorithms for solving this problem operate in a bottomup, breadthfirst search direction. The computation starts from frequent 1itemsets (the minimum length frequent itemsets) and continues until all maximal (length) frequent itemsets are found. During the execution, every frequent itemset is explicitly considered. Such algorithms perform well when all maximal frequent itemsets are short. However, performance drastically deteriorates when some of the maximal frequent itemsets are long. We present a new algorithm which combines both the bottomup and the topdown searches. The primary search direction is still bottomup, but a restricted search is also conducted in the topdown direction. This search is used only for maintaining and updating a new data structure, the maximum frequent candidate set. It is used to prune early candidates that would be normally encountered in the bottomup search. A very important characteristic of the algorithm is that it does not require explicit examination of every frequent itemset. Therefore, the algorithm performs well even when some maximal frequent itemsets are long. As its output, the algorithm produces the maximum frequent set, i.e., the set containing all maximal frequent itemsets, thus specifying immediately all frequent itemsets. We evaluate the performance of the algorithm using wellknown synthetic benchmark databases, reallife census, and stock market databases. The improvement in performance can be up to several orders of magnitude, compared to the best previous algorithms.
[1] R. Agrawal, A. Arning, T. Bollinger, M. Mehta, J. Shafer, and R. Srikant., “The Quest Data Mining System,” Proc. Second ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, Aug. 1996.
[2] R. Agrawal, T. Imielinski, and A. Swami, “Mining Association Rules Between Sets of Items in Large Databases,” Proc. 1993 ACMSIGMOD Int'l Conf. Management of Data, pp. 207216, May 1993.
[3] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. 1994 Int'l Conf. Very Large Data Bases, pp. 487499, Sept. 1994.
[4] R. Agrawal and J.C. Shafer, Parallel Mining of Association Rules: Design, Implementation, and Experience IEEE Trans. Knowledge and Data Eng., pp. 487499, Dec. 1996.
[5] R. Agarwal, C. Aggarwal, and V.V.V. Prassad, A Tree Projection Agorithm for Generation of Frequent Itemsets J. Parallel and Distributed Computing, 2000.
[6] R. Bayardo, “BruteForce Mining of HighConfidence Classification Rules,” Proc. Third ACM SIGKKD Int'l Conf. Knowledge Discovery and Data Mining, Aug. 1997.
[7] R.J. Bayardo, “Efficiently Mining Long Patterns From Databases,” ACM SIGMOD Conf. Management of Data, June 1998.
[8] S. Brin, R. Motwani, J. Ullman, and S. Tsur, “Dynamic Itemset Counting and Implication Rules for Market Basket Data,” ACM SIGMOD Conf. Management of Data, May 1997.
[9] U. Fayyad et al., eds., Advances in Knowledge Discovery and Data Mining, MIT Press, Cambridge, Mass., 1996.
[10] D. Gunopulos, R. Khardon, H. Mannila, and H. Toivonen, “Data Mining, Hypergraph Transversals, and Machine Learning,” Proc. 16th ACM Symp. Principles of Database Systems, May 1997.
[11] D. Gunopulos, H. Mannila, and S. Saluja, “Discovering All the Most Specific Sentences by Randomized Algorithms,” Int'l Conf. Database Theory, Jan. 1997.
[12] J. Han and Y. Fu, “Discovery of MultipleLevel Association Rules from Large Databases,” Proc. 1995 Int'l Conf. Very Large Data Bases, pp. 420431, Sept. 1995.
[13] J. Han, J. Pei, and Y. Yin, Mining Frequent Patterns without Candidate Generation Proc. ACM SIGMOD Conf. Management of Data (SIGMOD '00), pp. 112, 2000.
[14] E.H. Han, G. Karypis, and V. Kumar, “Scalable Parallel Data Mining for Association Rules,” ACM SIGMOD Conf. Management of Data, May 1997.
[15] K. Hätönen, M. Klemettinen, P. Mannila, H. Ronkainen, and H. Toivonen, “Knowledge Discovery from Telecommunication Network Alarm Databases,” Proc. 12th Int'l Conf. Data Eng., pp. 115122, Feb. 1996.
[16] M. Houtsma and A. Swami, “SetOriented Mining of Association Rules,” Research Report RJ 9567, IBM Almaden Research Center, Oct. 1993.
[17] D.I. Lin and Z.M. Kedem, “PincerSearch: A New Algorithm for Discovering the Maximum Frequent Set,” Sixth Int'l Conf. Extending Database Technology, Mar. 1998.
[18] T. Mitchell, “Generalization as Search,” Artificial Intelligence, vol. 18, 1982.
[19] A. Mueller, “Fast Sequential and Parallel Algorithms for Association Rule Mining: A Comparison,” Technical Report CSTR3515, Univ. of Maryland, College Park, Aug. 1995.
[20] H. Mannila and H. Toivonen, “Discovering Frequent Episodes in Sequences,” Proc. First ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), Aug. 1995.
[21] H. Mannila and H. Toivonen, “Levelwise Search and Borders of Theories in Knowledge Discovery,” Technical Report TR C19978, Dept. of Computer Science, Univ. of Helsinki, Jan. 1997.
[22] H. Mannila, H. Toivonen, and A. Verkamo, “Improved Methods for Finding Association Rules,” Proc. AAAI Workshop Knowledge Discovery, July 1994.
[23] The TAQ Database Release 1.0 in CDROM, New York Stock Exchange, Inc., June 1997.
[24] B. Ozden, S. Ramaswamy, and A. Silberschatz, Cyclic Association Rules Proc. 14th Int'l Conf. Data Eng., Apr. 1998.
[25] G. PiatetskyShapiro and W.J. Frawley, Knowledge Discovery in Databases. AAAI/MIT Press, 1991.
[26] J.S. Park, M.S. Chen, and P.S. Yu, “An Effective HashBased Algorithm for Mining Association Rules,” Proc. 1995 ACMSIGMOD Int'l Conf. Management of Data, pp. 175186, May 1995.
[27] R. Srikant and R. Agrawal, “Mining Generalized Association Rules,” Proc. 1995 Int'l Conf. Very Large Data Bases, pp. 407419, Sept. 1995.
[28] R. Srikant and R. Agrawal, “Mining Sequential Patterns: Generalizations and Performance Improvements,” Proc. Fifth Int'l Conf. Extending Database Technology (EDBT), pp. 317, 1996.
[29] A. Savasere, E. Omiecinski, and S. Navathe, “An Efficient Algorithm for Mining Association Rules in Large Databases,” Proc. 1995 Int'l Conf. Very Large Data Bases, pp. 432443, Sept. 1995.
[30] H. Toivonen, “Sampling Large Databases for Association Rules,” Proc. 1996 Int'l Conf. Very Large Data Bases, pp. 134145, Sept. 1996.
[31] M.J. Zaki, M. Ogihara, S. Parthasarathy, and W. Li, “Parallel Data Mining for Association Rules on SharedMemory MultiProcessors,” Technical Report 618, Computer Science Dept., The Univ. of Rochester, May 1996.
[32] M. Zaki, S. Parthasarathy, M. Ogihara, and W. Li, “New Algorithms for Fast Discovery of Association Rules,” Proc. Third ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), Aug. 1997.