This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Scalable Algorithms for Association Mining
May/June 2000 (vol. 12 no. 3)
pp. 372-390

Abstract—Association rule discovery has emerged as an important problem in knowledge discovery and data mining. The association mining task consists of identifying the frequent itemsets and then, forming conditional implication rules among them. In this paper, we present efficient algorithms for the discovery of frequent itemsets which forms the compute intensive phase of the task. The algorithms utilize the structural properties of frequent itemsets to facilitate fast discovery. The items are organized into a subset lattice search space, which is decomposed into small independent chunks or sublattices, which can be solved in memory. Efficient lattice traversal techniques are presented which quickly identify all the long frequent itemsets and their subsets if required. We also present the effect of using different database layout schemes combined with the proposed decomposition and traversal techniques. We experimentally compare the new algorithms against the previous approaches, obtaining improvements of more than an order of magnitude for our test databases.

[1] R. Agrawal, T. Imielinski, and A. Swami, “Mining Association Rules Between Sets of Items in Large Databases,” Proc. 1993 ACM-SIGMOD Int'l Conf. Management of Data, pp. 207-216, May 1993.
[2] R. Agrawal, H. Manilla, R. Srikant, H. Toivonen, and A.I. Verkami, “Fast Discovery of Association Rules,” Advances in Knowledge Discovery and Data Mining, U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, eds., pp. 307-328, 1996.
[3] R. Agrawal and J.C. Shafer, Parallel Mining of Association Rules: Design, Implementation, and Experience IEEE Trans. Knowledge and Data Eng., pp. 487-499, Dec. 1996.
[4] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. 1994 Int'l Conf. Very Large Data Bases, pp. 487-499, Sept. 1994.
[5] R.J. Bayardo, “Efficiently Mining Long Patterns From Databases,” ACM SIGMOD Conf. Management of Data, June 1998.
[6] S. Brin, R. Motwani, J. Ullman, and S. Tsur, “Dynamic Itemset Counting and Implication Rules for Market Basket Data,” ACM SIGMOD Conf. Management of Data, May 1997.
[7] D. Cheung, J. Han, V. Ng, A. Fu, and Y. Fu, “A Fast Distributed Algorithm for Mining Association Rules,” Fourth Int'l Conf. Parallel and Distributed Information Systems, Dec. 1996.
[8] B.A. Davey and H.A. Priestley, Introduction to Lattices and Order. Cambridge Univ. Press, 1990.
[9] D. Eppstein, “Arboricity and Bipartite Subgraph Listing Algorithms,” Information Processing Letters, vol. 51, pp. 207–211, 1994.
[10] M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness.New York: W.H. Freeman, 1979.
[11] D. Gunopulos, R. Khardon, H. Mannila, and H. Toivonen, “Data Mining, Hypergraph Transversals, and Machine Learning,” Proc. 16th ACM Symp. Principles of Database Systems, May 1997.
[12] D. Gunopulos, H. Mannila, and S. Saluja, “Discovering All the Most Specific Sentences by Randomized Algorithms,” Int'l Conf. Database Theory, Jan. 1997.
[13] E.-H. Han, G. Karypis, and V. Kumar, “Scalable Parallel Data Mining for Association Rules,” ACM SIGMOD Conf. Management of Data, May 1997.
[14] M. Holsheimer, M. Kersten, H. Mannila, and H. Toivonen, “A Perspective on Databases and Data Mining,” First Int'l Conf. Knowledge Discovery and Data Mining, Aug. 1995.
[15] M. Houtsma and A. Swami, “Set-Oriented Mining of Association Rules in Relational Databases,” 11th Int'l Conf. Data Eng., 1995.
[16] T. Kashiwabara, S. Masuda, K. Nakajima, and T. Fujisawa, “Generation of Maximum Independent Sets of a Bipartite Graph and Maximum Cliques of a Circular-Arc Graph,” J. Algorithms, vol. 13, pp. 161–174, 1992.
[17] S.O. Kuznetsov, “Interpretation on Graphs and Complexity Characteristics of a Search for Specific Patterns,” Nauchn. Tekh. Inf., Ser. 2 (Automatic Document Math Linguist), vol. 23, no. 1, pp. 23–37, 1989.
[18] L.E. LaForge, “Some Zarankiewicz Numbers,” Technical Report SOCS-94. Z, McGill Univ., July 1994.
[19] D.-I. Lin and Z.M. Kedem, “Pincer-Search: A New Algorithm for Discovering the Maximum Frequent Set,” Sixth Int'l Conf. Extending Database Technology, Mar. 1998.
[20] J.-L. Lin and M.H. Dunham, Mining Association Rules: Anti-Skew Algorithms Proc. Int'l Conf. Data Eng., pp. 486-493, 1998.
[21] A. Mueller, “Fast Sequential and Parallel Algorithms for Association Rule Mining: A Comparison,” Technical Report CS-TR-3515, Univ. of Maryland, College Park, Aug. 1995.
[22] G.D. Mulligan and D.G. Corneil, “Corrections to Bierstone's Algorithm for Generating Cliques,” J. ACM, vol. 19, no. 2, pp. 244–247, 1972.
[23] J.S. Park, M.S. Chen, and P.S. Yu, “An Effective Hash-Based Algorithm for Mining Association Rules,” Proc. 1995 ACM-SIGMOD Int'l Conf. Management of Data, pp. 175-186, May 1995.
[24] S. Parthasarathy, M.J. Zaki, and W. Li, “Memory Placement Techniques for Parallel Association Mining,” Fourth Int'l Conf. Knowledge Discovery and Data Mining, Aug. 1998.
[25] S. Sarawagi, S. Thomas, and R. Agrawal, “Integrating Association Rule Mining with Databases: Alternatives and Implications,” ACM SIGMOD Int'l Conf. Management of Data, June 1998.
[26] A. Savasere, E. Omiecinski, and S. Navathe, “An Efficient Algorithm for Mining Association Rules in Large Databases,” Proc. 1995 Int'l Conf. Very Large Data Bases, pp. 432-443, Sept. 1995.
[27] H. Toivonen, “Sampling Large Databases for Association Rules,” Proc. 1996 Int'l Conf. Very Large Data Bases, pp. 134-145, Sept. 1996.
[28] S.-J. Yen and A.L.P. Chen, “An Efficient Approach to Discovering Knowledge from Large Databases,” Fourth Int'l Conf. Parallel and Distributed Information Systems, Dec. 1996.
[29] M.J. Zaki, S. Parthasarathy, W. Li, and M. Ogihara, “Evaluation of Sampling for Data Mining of Association Rules,” Seventh Int'l Workshop on Research Issues in Data Eng., Apr. 1997.
[30] M.J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li, “New Algorithms for Fast Discovery of Association Rules,” Third Int'l Conf. Knowledge Discovery and Data Mining, Aug. 1997.
[31] M.J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li, “Parallel Algorithms for Fast Discovery of Association Rules,” Data Mining and Knowledge Discovery: An Int'l Journal, vol. 1, no. 4, pp. 343–373, Dec. 1997.

Index Terms:
Association rules, frequent itemsets, equivalence classes, maximal cliques, lattices, data mining.
Citation:
Mohammed J. Zaki, "Scalable Algorithms for Association Mining," IEEE Transactions on Knowledge and Data Engineering, vol. 12, no. 3, pp. 372-390, May-June 2000, doi:10.1109/69.846291
Usage of this product signifies your acceptance of the Terms of Use.