This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Finding Interesting Associations without Support Pruning
January/February 2001 (vol. 13 no. 1)
pp. 64-78

Abstract—Association-rule mining has heretofore relied on the condition of high support to do its work efficiently. In particular, the well-known a priori algorithm is only effective when the only rules of interest are relationships that occur very frequently. However, there are a number of applications, such as data mining, identification of similar web documents, clustering, and collaborative filtering, where the rules of interest have comparatively few instances in the data. In these cases, we must look for highly correlated items, or possibly even causal relationships between infrequent items. We develop a family of algorithms for solving this problem, employing a combination of random sampling and hashing techniques. We provide analysis of the algorithms developed and conduct experiments on real and synthetic data to obtain a comparative performance analysis.

[1] R. Agrawal, T. Imielinski, and A. Swami, “Mining Association Rules Between Sets of Items in Large Databases,” Proc. 1993 ACM-SIGMOD Int'l Conf. Management of Data, pp. 207-216, May 1993.
[2] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. 1994 Int'l Conf. Very Large Data Bases, pp. 487-499, Sept. 1994.
[3] S. Brin, R. Motwani, J. Ullman, and S. Tsur, “Dynamic Itemset Counting and Implication Rules for Market Basket Data,” ACM SIGMOD Conf. Management of Data, May 1997.
[4] A. Broder, “On the Resemblance and Containment of Documents,” Proc. Compression and Complexity of Sequences Conf. (SEQUENCES '97), pp. 21–29, 1998.
[5] E. Cohen, “Size-Estimation Framework with Applications to Transitive Closure and Reachability,” J. Computer and System Sciences, vol. 55, pp. 441–453, 1997.
[6] R.O. Duda and P.E. Hart, Pattern Classification and Scene Analysis. New York: Wiley InterScience, 1973.
[7] A. Gionis, P. Indyk, and R. Motwani, “Similarity Search in High Dimensions via Hashing,” Proc. Very Large Data Base Conf. (VLDB '99), pp. 518–529, Sept. 1999.
[8] D. Goldberg, D. Nichols, B.M. Oki, and D. Terry, “Using Collaborative Filtering to Weave an Information Tapestry,” Comm. ACM, vol. 55, pp. 1–19, 1991.
[9] S. Guha, R. Rastogi, and K. Shim, CURE: An Efficient Clustering Algorithm for Large Databases Proc. ACM SIGMOD, pp. 73-84, June 1998.
[10] J.M. Hellerstein, P.J. Haas, and H.J. Wang, "Online Aggregation," Proc. ACM SIGMOD Int'l Conf. Management of Data, ACM Press, New York, 1997, pp. 171-182.
[11] P. Indyk and R. Motwani, “Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality,” Proc. ACM Symp. Theory of Computing, pp. 604-613, 1998.
[12] R. Motwani and P. Raghavan, Randomized Algorithms. Cambridge Univ. Press, 1995.
[13] N. Shivakumar and H. Garcia-Molina, “Building a Scalable and Accurate Copy Detection Mechanism,” Proc. Third Int'l Conf. Theory and Practice of Digital Libraries, 1996.
[14] S. Brin, R. Motwani, and C. Silverstein, “Beyond Market Basket: Generalizing Association Rules to Correlations,” Proc. 1997 ACM-SIGMOD Int'l Conf. Management of Data, pp. 265-276, May 1997.
[15] C. Silverstein, S. Brin, R. Motwani, and J. Ullman, “Scalable Techniques for Mining Causal Structures,” Proc. 1998 Int'l Conf. Very Large Data Bases, pp. 594-605, Aug. 1998.
[16] P. Resnick and H. Varian, "Recommender Systems," Comm. ACM, Vol. 40, No. 3, Mar. 1997, pp. 56-58.

Index Terms:
Data mining, association rules, similarity metric, min hashing, locality sensitive hashing.
Citation:
Edith Cohen, Mayur Datar, Shinji Fujiwara, Aristides Gionis, Piotr Indyk, Rajeev Motwani, Jeffrey D. Ullman, Cheng Yang, "Finding Interesting Associations without Support Pruning," IEEE Transactions on Knowledge and Data Engineering, vol. 13, no. 1, pp. 64-78, Jan.-Feb. 2001, doi:10.1109/69.908981
Usage of this product signifies your acceptance of the Terms of Use.