
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
Mohammed J. Zaki, "Scalable Algorithms for Association Mining," IEEE Transactions on Knowledge and Data Engineering, vol. 12, no. 3, pp. 372390, May/June, 2000.  
BibTex  x  
@article{ 10.1109/69.846291, author = {Mohammed J. Zaki}, title = {Scalable Algorithms for Association Mining}, journal ={IEEE Transactions on Knowledge and Data Engineering}, volume = {12}, number = {3}, issn = {10414347}, year = {2000}, pages = {372390}, doi = {http://doi.ieeecomputersociety.org/10.1109/69.846291}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Knowledge and Data Engineering TI  Scalable Algorithms for Association Mining IS  3 SN  10414347 SP372 EP390 EPD  372390 A1  Mohammed J. Zaki, PY  2000 KW  Association rules KW  frequent itemsets KW  equivalence classes KW  maximal cliques KW  lattices KW  data mining. VL  12 JA  IEEE Transactions on Knowledge and Data Engineering ER   
Abstract—Association rule discovery has emerged as an important problem in knowledge discovery and data mining. The association mining task consists of identifying the frequent itemsets and then, forming conditional implication rules among them. In this paper, we present efficient algorithms for the discovery of frequent itemsets which forms the compute intensive phase of the task. The algorithms utilize the structural properties of frequent itemsets to facilitate fast discovery. The items are organized into a subset lattice search space, which is decomposed into small independent chunks or sublattices, which can be solved in memory. Efficient lattice traversal techniques are presented which quickly identify all the long frequent itemsets and their subsets if required. We also present the effect of using different database layout schemes combined with the proposed decomposition and traversal techniques. We experimentally compare the new algorithms against the previous approaches, obtaining improvements of more than an order of magnitude for our test databases.
[1] R. Agrawal, T. Imielinski, and A. Swami, “Mining Association Rules Between Sets of Items in Large Databases,” Proc. 1993 ACMSIGMOD Int'l Conf. Management of Data, pp. 207216, May 1993.
[2] R. Agrawal, H. Manilla, R. Srikant, H. Toivonen, and A.I. Verkami, “Fast Discovery of Association Rules,” Advances in Knowledge Discovery and Data Mining, U.M. Fayyad, G. PiatetskyShapiro, P. Smyth, and R. Uthurusamy, eds., pp. 307328, 1996.
[3] R. Agrawal and J.C. Shafer, Parallel Mining of Association Rules: Design, Implementation, and Experience IEEE Trans. Knowledge and Data Eng., pp. 487499, Dec. 1996.
[4] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. 1994 Int'l Conf. Very Large Data Bases, pp. 487499, Sept. 1994.
[5] R.J. Bayardo, “Efficiently Mining Long Patterns From Databases,” ACM SIGMOD Conf. Management of Data, June 1998.
[6] S. Brin, R. Motwani, J. Ullman, and S. Tsur, “Dynamic Itemset Counting and Implication Rules for Market Basket Data,” ACM SIGMOD Conf. Management of Data, May 1997.
[7] D. Cheung, J. Han, V. Ng, A. Fu, and Y. Fu, “A Fast Distributed Algorithm for Mining Association Rules,” Fourth Int'l Conf. Parallel and Distributed Information Systems, Dec. 1996.
[8] B.A. Davey and H.A. Priestley, Introduction to Lattices and Order. Cambridge Univ. Press, 1990.
[9] D. Eppstein, “Arboricity and Bipartite Subgraph Listing Algorithms,” Information Processing Letters, vol. 51, pp. 207–211, 1994.
[10] M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the Theory of NPCompleteness.New York: W.H. Freeman, 1979.
[11] D. Gunopulos, R. Khardon, H. Mannila, and H. Toivonen, “Data Mining, Hypergraph Transversals, and Machine Learning,” Proc. 16th ACM Symp. Principles of Database Systems, May 1997.
[12] D. Gunopulos, H. Mannila, and S. Saluja, “Discovering All the Most Specific Sentences by Randomized Algorithms,” Int'l Conf. Database Theory, Jan. 1997.
[13] E.H. Han, G. Karypis, and V. Kumar, “Scalable Parallel Data Mining for Association Rules,” ACM SIGMOD Conf. Management of Data, May 1997.
[14] M. Holsheimer, M. Kersten, H. Mannila, and H. Toivonen, “A Perspective on Databases and Data Mining,” First Int'l Conf. Knowledge Discovery and Data Mining, Aug. 1995.
[15] M. Houtsma and A. Swami, “SetOriented Mining of Association Rules in Relational Databases,” 11th Int'l Conf. Data Eng., 1995.
[16] T. Kashiwabara, S. Masuda, K. Nakajima, and T. Fujisawa, “Generation of Maximum Independent Sets of a Bipartite Graph and Maximum Cliques of a CircularArc Graph,” J. Algorithms, vol. 13, pp. 161–174, 1992.
[17] S.O. Kuznetsov, “Interpretation on Graphs and Complexity Characteristics of a Search for Specific Patterns,” Nauchn. Tekh. Inf., Ser. 2 (Automatic Document Math Linguist), vol. 23, no. 1, pp. 23–37, 1989.
[18] L.E. LaForge, “Some Zarankiewicz Numbers,” Technical Report SOCS94. Z, McGill Univ., July 1994.
[19] D.I. Lin and Z.M. Kedem, “PincerSearch: A New Algorithm for Discovering the Maximum Frequent Set,” Sixth Int'l Conf. Extending Database Technology, Mar. 1998.
[20] J.L. Lin and M.H. Dunham, Mining Association Rules: AntiSkew Algorithms Proc. Int'l Conf. Data Eng., pp. 486493, 1998.
[21] A. Mueller, “Fast Sequential and Parallel Algorithms for Association Rule Mining: A Comparison,” Technical Report CSTR3515, Univ. of Maryland, College Park, Aug. 1995.
[22] G.D. Mulligan and D.G. Corneil, “Corrections to Bierstone's Algorithm for Generating Cliques,” J. ACM, vol. 19, no. 2, pp. 244–247, 1972.
[23] J.S. Park, M.S. Chen, and P.S. Yu, “An Effective HashBased Algorithm for Mining Association Rules,” Proc. 1995 ACMSIGMOD Int'l Conf. Management of Data, pp. 175186, May 1995.
[24] S. Parthasarathy, M.J. Zaki, and W. Li, “Memory Placement Techniques for Parallel Association Mining,” Fourth Int'l Conf. Knowledge Discovery and Data Mining, Aug. 1998.
[25] S. Sarawagi, S. Thomas, and R. Agrawal, “Integrating Association Rule Mining with Databases: Alternatives and Implications,” ACM SIGMOD Int'l Conf. Management of Data, June 1998.
[26] A. Savasere, E. Omiecinski, and S. Navathe, “An Efficient Algorithm for Mining Association Rules in Large Databases,” Proc. 1995 Int'l Conf. Very Large Data Bases, pp. 432443, Sept. 1995.
[27] H. Toivonen, “Sampling Large Databases for Association Rules,” Proc. 1996 Int'l Conf. Very Large Data Bases, pp. 134145, Sept. 1996.
[28] S.J. Yen and A.L.P. Chen, “An Efficient Approach to Discovering Knowledge from Large Databases,” Fourth Int'l Conf. Parallel and Distributed Information Systems, Dec. 1996.
[29] M.J. Zaki, S. Parthasarathy, W. Li, and M. Ogihara, “Evaluation of Sampling for Data Mining of Association Rules,” Seventh Int'l Workshop on Research Issues in Data Eng., Apr. 1997.
[30] M.J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li, “New Algorithms for Fast Discovery of Association Rules,” Third Int'l Conf. Knowledge Discovery and Data Mining, Aug. 1997.
[31] M.J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li, “Parallel Algorithms for Fast Discovery of Association Rules,” Data Mining and Knowledge Discovery: An Int'l Journal, vol. 1, no. 4, pp. 343–373, Dec. 1997.