The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.05 - May (2009 vol.21)
pp: 699-713
Aris Gkoulalas-Divanis , University of Thessaly, Volos
Vassilios S. Verykios , University of Thessaly, Volos
ABSTRACT
In this paper, we propose a novel, exact border-based approach that provides an optimal solution for the hiding of sensitive frequent itemsets by (i) minimally extending the original database by a synthetically generated database part - the database extension, (ii) formulating the creation of the database extension as a constraint satisfaction problem, (iii) mapping the constraint satisfaction problem to an equivalent binary integer programming problem, (iv) exploiting underutilized synthetic transactions to proportionally increase the support of non-sensitive itemsets, (v) minimally relaxing the constraint satisfaction problem to provide an approximate solution close to the optimal one when an ideal solution does not exist, and (vi) by using a partitioning in the universe of the items to increase the efficiency of the proposed hiding algorithm. Extending the original database for sensitive itemset hiding is proved to provide optimal solutions to an extended set of hiding problems compared to previous approaches and to provide solutions of higher quality. Moreover, the application of binary integer programming enables the simultaneous hiding of the sensitive itemsets and thus allows for the identification of globally optimal solutions.
INDEX TERMS
Data mining, Mining methods and algorithms
CITATION
Aris Gkoulalas-Divanis, Vassilios S. Verykios, "Exact Knowledge Hiding through Database Extension", IEEE Transactions on Knowledge & Data Engineering, vol.21, no. 5, pp. 699-713, May 2009, doi:10.1109/TKDE.2008.199
REFERENCES
[1] R. Agrawal and R. Srikant, “Privacy-Preserving Data Mining,” Proc. ACM SIGMOD '00, pp. 439-450, 2000.
[2] C.C. Aggarwal and P.S. Yu, “On Variable Constraints in Privacy Preserving Data Mining,” Proc. SIAM Int'l Conf. Data Mining (SDM), 2005.
[3] C. Clifton and D. Marks, “Security and Privacy Implications of Data Mining,” Proc. 1996 ACM SIGMOD Int'l Workshop Data Mining and Knowledge Discovery, pp. 15-19, Feb. 1996.
[4] C. Clifton, M. Kantarcioglu, J. Vaidya, X. Lin, and M.Y. Zhu, “Tools for Privacy Preserving Distributed Data Mining,” ACM SIGKDD Exploration Newsletter, vol. 4, no. 2, pp. 28-34, 2002.
[5] Y. Saygin, V.S. Verykios, and C. Clifton, “Using Unknowns to Prevent Discovery of Association Rules,” ACM SIGMOD Record, vol. 30, no. 4, pp. 45-54, 2001.
[6] C.C. Aggarwal and P.S. Yu, Privacy Preserving Data Mining: Models and Algorithms (Advances in Database Systems). Springer-Verlag, 2008.
[7] V.S. Verykios, A.K. Emagarmid, E. Bertino, Y. Saygin, and E. Dasseni, “Association Rule Hiding,” IEEE Trans. Knowledge and Data Eng., vol. 16, no. 4, pp. 434-447, Apr. 2004.
[8] M. Atallah, E. Bertino, A. Elmagarmid, M. Ibrahim, and V.S. Verykios, “Disclosure Limitation of Sensitive Rules,” Proc. IEEE Knowledge and Data Eng. Exchange Workshop (KDEX '99), pp. 45-52, 1999.
[9] S. Menon, S. Sarkar, and S. Mukherjee, “Maximizing Accuracy of Shared Databases When Concealing Sensitive Patterns,” Information Systems Research, vol. 16, no. 3, pp. 256-270, 2005.
[10] S.R.M. Oliveira and O.R. Zaïane, “Protecting Sensitive Knowledge by Data Sanitization,” Proc. Third IEEE Int'l Conf. Data Mining (ICDM '03), pp. 211-218, 2003.
[11] A. Gkoulalas-Divanis and V.S. Verykios, “An Integer Programming Approach for Frequent Itemset Hiding,” Proc. ACM Conf. Information and Knowledge Management (CIKM '06), pp. 748-757, Nov. 2006.
[12] G.V. Moustakides and V.S. Verykios, personal comm., 2006.
[13] E. Bertino, I.N. Fovino, and L.P. Povenza, “A Framework for Evaluating Privacy Preserving Data Mining Algorithms,” DataMining and Knowledge Discovery, vol. 11, no. 2, pp. 121-154, 2005.
[14] M. Kantarcioglu and C. Clifton, “Privacy-Preserving Distributed Mining of Association Rules on Horizontally Partitioned Data,” IEEE Trans. Knowledge and Data Eng., vol. 16, no. 9, pp. 1026-1037, Sept. 2004.
[15] Y.-H. Wu, C.-M. Chiang, and A.L.P. Chen, “Hiding Sensitive Association Rules with Limited Side Effects,” IEEE Trans. Knowledge and Data Eng., vol. 19, no. 1, pp. 29-42, Jan. 2007.
[16] A. Amiri, “Dare to Share: Protecting Sensitive Knowledge with Data Sanitization,” Decision Support Systems, vol. 43, no. 1, pp. 181-191, 2007.
[17] X. Sun and P.S. Yu, “A Border-Based Approach for Hiding Sensitive Frequent Itemsets,” Proc. Fifth IEEE Int'l Conf. Data Mining (ICDM '05), pp. 426-433, 2005.
[18] X. Sun and P.S. Yu, “Hiding Sensitive Frequent Itemsets by a Border-Based Approach,” Computing Science and Eng., vol. 1, no. 1, pp. 74-94, 2007.
[19] T. Mielikainen, “On Inverse Frequent Set Mining,” Proc. Second IEEE ICDM Workshop Privacy Preserving Data Mining (PPDM '03), pp. 18-23, 2003.
[20] T. Calders, “Computational Complexity of Itemset Frequency Satisfiability,” Proc. 23rd ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems (PODS '04), pp. 143-154, 2004.
[21] X. Chen, M. Orlowska, and X. Li, “A New Framework of Privacy Preserving Data Sharing,” Proc. Fourth IEEE Int'l Workshop Privacy and Security Aspects of Data Mining (PSDM '04), pp. 47-56, 2004.
[22] X. Wu, Y. Wu, Y. Wang, and Y. Li, “Privacy-Aware Market Basket Data Set Generation: A Feasible Approach for Inverse Frequent Set Mining,” Proc. SIAM Int'l Conf. Data Mining (SDM), 2005.
[23] J.M. Mateo-Sanz, A. Martínez-Ballesté, and J. Domingo-Ferrer, “Fast Generation of Accurate Synthetic Microdata,” Proc. Int'l Workshop Privacy in Statistical Databases (PSD '04), pp. 298-306, 2004.
[24] R.A. Dandekar, J. Domingo-Ferrer, and F. Sebe, “Lhs-Based Hybrid Microdata versus Rank Swapping and Microaggregation for Numeric Microdata Protection,” Inference Control in Statistical Databases, from Theory to Practice, pp. 153-162, 2002.
[25] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules in Large Databases,” Proc. 20th Int'l Conf. Very Large Databases (VLDB '94), pp. 487-499, 1994.
[26] S. Russell and P. Norvig, Artificial Intelligence: A Modern Approach, second ed. Prentice-Hall, 2003.
[27] C. Gueret, C. Prins, and M. Sevaux, Applications of Optimization with Xpress-MP. Dash Optimization, 2002.
[28] M. Yokoo, E.H. Durfee, T. Ishida, and K. Kuwabara, “The Distributed Constraint Satisfaction Problem: Formalization and Algorithms,” IEEE Trans. Knowledge and Data Eng., vol. 10, no. 5, pp. 673-685, Sept./Oct. 1998.
[29] M. Yokoo and K. Hirayama, “Algorithms for Distributed Constraint Satisfaction: A Review,” Autonomous Agents and Multi-Agent Systems, vol. 3, no. 2, pp. 185-207, 2000.
[30] F.S.G. Gottlob and N. Leone, “A Comparison of Structural CSP Decomposition Methods,” Artificial Intelligence, vol. 124, no. 2, pp.243-282, 2000.
[31] R. Kohavi, C. Brodley, B. Frasca, L. Mason, and Z. Zheng, “KDD-Cup 2000 Organizers' Report: Peeling the Onion,” SIGKDD Explorations, vol. 2, no. 2, pp. 86-98, http://www.ecn.purdue.eduKDDCUP, 2000.
[32] R. Bayardo, “Efficiently Mining Long Patterns from Databases,” Proc. ACM SIGMOD, 1998.
[33] I. Ilog, CPLEX 9.0 User's Manual, Mountain View, CA, http:/www.ilog.com/, Oct. 2005.
[34] G. Moustakides and V.S. Verykios, “A Max-Min Approach for Hiding Frequent Itemsets,” Proc. Sixth IEEE Int'l Conf. Data Mining (ICDM '06), pp. 502-506, 2006.
19 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool