This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
MAFIA: A Maximal Frequent Itemset Algorithm
November 2005 (vol. 17 no. 11)
pp. 1490-1504
We present a new algorithm for mining maximal frequent itemsets from a transactional database. The search strategy of the algorithm integrates a depth-first traversal of the itemset lattice with effective pruning mechanisms that significantly improve mining performance. Our implementation for support counting combines a vertical bitmap representation of the data with an efficient bitmap compression scheme. In a thorough experimental analysis, we isolate the effects of individual components of MAFIA including search space pruning techniques and adaptive compression. We also compare our performance with previous work by running tests on very different types of data sets. Our experiments show that MAFIA performs best when mining long itemsets and outperforms other algorithms on dense data by a factor of three to 30.

[1] S. Brin, R. Motwani, J.D. Ullman, and S. Tsur, “Dynamic Itemset Counting and Implication Rules for Market Basket Data,” Proc. ACM SIGMOD Int'l Conf. Management of Data, J. Peckham, ed., pp. 255-264, 1997, citeseer.nj.nec.combrin97dynamic.html.
[2] W. Lee and S. Stolfo, “Data Mining Approaches for Intrusion Detection,” Proc. Seventh USENIX Security Symp., 1998, citeseer.nj.nec.com/articlelee00data.html .
[3] B. Mobasher, N. Jain, E. Han, and J. Srivastava, “Web Mining: Pattern Discovery,” World Wide Web Trans., 1996. citeseer.nj. nec.commobasher96Web.html.
[4] R. Agrawal, T. Imielinski, and A.N. Swami, “Mining Association Rules between Sets of Items in Large Databases,” Proc. ACM SIGMOD Int'l Conf. Management of Data, P. Buneman and S. Jajodia, eds., pp. 207-216, 1993, .
[5] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. 20th Int'l Conf. Very Large Data Bases, J. B. Bocca, M. Jarke, and C. Zaniolo, eds., pp. 487-499, 1994, citeseer.nj.nec.com/agrawal93mining.htmlciteseer.nj.nec.com agrawal94fast.html.
[6] R.J. Bayardo and R. Agrawal, “Mining the Most Interesting Rules,” Proc. Fifth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 145-154, 1999.
[7] J. Lin and M.H. Dunham, “Mining Association Rules: Anti-Skew Algorithms,” Proc. 14th Int'l Conf. Data Eng., pp. 486-493, 1998, .
[8] R. Rastogi and K. Shim, “Mining Optimized Association Rules with Categorical and Numeric Attributes,” Proc. 14th Int'l Conf. Data Eng., pp. 503-512, 1998, .
[9] R. Srikant and R. Agrawal, “Mining Generalized Association Rules,” Future Generation Computer Systems, vol. 13, nos. 2-3, pp. 161-180, 1997, .
[10] K. Wang, Y. He, and J. Han, “Mining Frequent Itemsets Using Support Constraints,” Proc. 26th Int'l Conf. Very Large Databases, pp. 43-52, 2000, citeseer.ist.psu.edu/lin98mining.htmlciteseer.nj.nec.com/ article/rastogi98mining.htmlciteseer.nj.nec.com/ srikant95mining.htmlciteseer.nj.nec.com wang00mining.html .
[11] G.I. Webb, “Opus: An Efficient Admissible Algorithm for Unordered Search,” J. Artificial Intelligence Research, vol. 3, pp. 431-465, 1995, citeseer.nj.nec.com35589.html.
[12] N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal, “Discovering Frequent Closed Itemsets for Association Rules Dude!” Lecture Notes in Computer Science, vol. 1540, pp. 398-416, 1999, citeseer.nj.nec.compasquier99discovering.html .
[13] J. Pei, J. Han, and R. Mao, “Closet: An Efficient Algorithm for Mining Frequent Closed Itemsets,” Proc. ACM SIGMOD Workshop Research Issues in Data Mining and Knowledge Discovery, pp. 11-20, 2000, citeseer.nj.nec. compei00closet.html.
[14] M. Zaki and C. Hsiao, “Charm: An Efficient Algorithm for Closed Association Rule Mining,” technical report, RPI, 1999, citeseer.nj. nec.comzaki99charm.html,
[15] R.J. Bayardo, “Efficiently Mining Long Patterns from Databases,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 85-93, 1998.
[16] I. Rigoutsos and A. Floratos, “Combinatorial Pattern Discovery in Biological Sequences: The Teiresias Algorithm,” Bioinformatics, vol. 14, no. 1, pp. 55-67, 1998.
[17] R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A.I. Verkamo, “Fast Discovery of Association Rules,” Advances in Knowledge Discovery and Data Mining, pp. 307-328, 1996.
[18] C.C. Aggarwal and P.S. Yu, “Mining Large Itemsets for Association Rules,” Bull. IEEE CS Technical Comittee Data Eng., vol. 21, no. 1, pp. 23-31, 1998, citeseer.nj.nec.comaggarwal98mining.html .
[19] “Online Generation of Association Rules,” Proc. 14th Int'l Conf. Data Eng., pp. 402-411, 1998, citeseer.nj.nec.comaggarwal98o nline.html .
[20] B. Dunkel and N. Soparkar, “Data Organization and Access for Efficient Data Mining,” Proc. 15th Int'l Conf. Data Eng. (ICDE), pp. 522-529, 1999, .
[21] V. Ganti, J. Gehrke, and R. Ramakrishnan, “Demon: Mining and Monitoring Evolving Data,” IEEE Trans. Knowledge and Data Eng., vol. 13, no. 1, pp. 50-63, 2001, .
[22] J. Han, J. Pei, and Y. Yin, “Mining Frequent Patterns without Candidate Generation,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 1-12, 2000, citeseer.nj.nec.com/dunkel99data.htmlciteseer.nj.nec.com/ ganti00de mon.htmlciteseer.nj.nec.com/ articlehan99mining.html .
[23] J.S. Park, M.-S. Chen, and P.S. Yu, “An Effective Hash-Based Algorithm for Mining Association Rules,” Proc. ACM SIGMOD Int'l Conf. Management of Data, M.J. Carey and D.A. Schneider, eds., pp. 175-186, 1995, .
[24] A. Savasere, E. Omiecinski, and S.B. Navathe, “An Efficient Algorithm for Mining Association Rules in Large Databases,” Proc. 21st Int'l Conf. Very Large Databases, pp. 432-444, 1995, citeseer.nj.nec.com/park95effective.htmlciteseer.nj.nec.com sarasere95efficient.html .
[25] P. Shenoy, J. Haritsa, S. Sudarshan, G. Bhalotia, M. Bawa, and D. Shah, “Turbo-Charging Vertical Mining of Large Databases,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 22-33, 2000, .
[26] H. Toivonen, “Sampling Large Databases for Association Rules,” Proc. 22nd Int'l Conf. Very Large Databases, T.M. Vijayaraman, A.P. Buchmann, C. Mohan, and N.L. Sarda, eds., pp. 134-145, 1996, citeseer.ist.psu.edu/513368.htmlciteseer.nj.nec.com toivonen96sampling.html.
[27] C.L. Yip, K.K. Loo, B. Kao, D.W.-L. Cheung, and C.K. Cheng, “Lgen— A Lattice-Based Candidate Set Generation Algorithm for I/O Efficient Association Rule Mining,” Proc. Pacific-Asia Conf. Knowledge Discovery and Data Mining, pp. 54-63, 1999, citeseer.nj. nec.com438064.html.
[28] R.C. Agarwal, C.C. Aggarwal, and V.V.V. Prasad, “A Tree Projection Algorithm for Generation of Frequent Item Sets,” J. Parallel and Distributed Computing, vol. 61, no. 3, pp. 350-371, 2001, .
[29] D. Gunopulos, H. Mannila, and S. Saluja, “Discovering All Most Specific Sentences by Randomized Algorithms,” Proc. Int'l Conf. Database Theory, pp. 215-229, 1997, citeseer.nj.nec.com/agarwal99tree.htmlciteseer.nj.nec. com gunopulos97discovering.html .
[30] D.-I. Lin and Z.M. Kedem, “Pincer Search: A New Algorithm for Discovering the Maximum Frequent Set,” Proc. Sixth European Conf. Extending Database Technology, pp. 105-119, 1998, citeseer.nj.nec.comlin98pincersearch.html .
[31] M.J. Zaki, “Scalable Algorithms for Association Mining,” IEEE Trans. Knowledge and Data Eng., vol. 12, pp. 372-390, 2000.
[32] M. Holsheimer, M.L. Kersten, H. Mannila, and H. Toivonen, “A Perspective on Databases and Data Mining,” Proc. First Int'l Conf. Knowledge Discovery and Data Mining, pp. 150-155, 1995, citeseer.nj.nec.comholsheimer95perspective.html .
[33] D. Burdick, M. Calimlim, and J. Gehrke, “Mafia: A Maximal Frequent Itemset Algorithm for Transactional Databases,” Proc. 17th Int'l Conf. Data Eng., pp. 443-452, 2001, .
[34] R. Rymon, “Search through Systematic Set Enumeration,” Proc. Third Int'l Conf. Principles of Knowledge Representation and Reasoning, pp. 539-550, 1992.
[35] K. Gouda and M.J. Zaki, “Efficiently Mining Maximal Frequent Itemsets,” Proc. IEEE Int'l Conf. Data Mining, pp. 163-170, 2001, .
[36] R. Kohavi, C. Brodley, B. Frasca, L. Mason, and Z. Zheng, “Kdd-Cup 2000 Organizers' Report: Peeling the Onion,” SIGKDD Explorations, vol. 2, no. 2, pp. 86-98, 2000, http://www.cs. cornell.edu/johannes/papers/ 2001/icde2001-mafia.pdfciteseer.nj.nec.com/ 499930.htmlhttp://www.ecn.pur due.edu KDDCUP.
[37] C. Blake and C. Merz UCI Repository of Machine Learning Databases, 1998, http://www.ics.uci.edu/mlearnMLReposi tory.html .
[38] R.C. Agarwal, C.C. Aggarwal, and V.V. V. Prasad, “Depth First Generation of Long Patterns,” Proc. Sixth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 108-118, 2000, .

Index Terms:
Index Terms- Itemset mining, maximal itemsets, transactional databases.
Citation:
Doug Burdick, Manuel Calimlim, Jason Flannick, Johannes Gehrke, Tomi Yiu, "MAFIA: A Maximal Frequent Itemset Algorithm," IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 11, pp. 1490-1504, Nov. 2005, doi:10.1109/TKDE.2005.183
Usage of this product signifies your acceptance of the Terms of Use.