Subscribe

Issue No.05 - May (2009 vol.21)

pp: 699-713

Aris Gkoulalas-Divanis , University of Thessaly, Volos

Vassilios S. Verykios , University of Thessaly, Volos

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2008.199

ABSTRACT

In this paper, we propose a novel, exact border-based approach that provides an optimal solution for the hiding of sensitive frequent itemsets by (i) minimally extending the original database by a synthetically generated database part - the database extension, (ii) formulating the creation of the database extension as a constraint satisfaction problem, (iii) mapping the constraint satisfaction problem to an equivalent binary integer programming problem, (iv) exploiting underutilized synthetic transactions to proportionally increase the support of non-sensitive itemsets, (v) minimally relaxing the constraint satisfaction problem to provide an approximate solution close to the optimal one when an ideal solution does not exist, and (vi) by using a partitioning in the universe of the items to increase the efficiency of the proposed hiding algorithm. Extending the original database for sensitive itemset hiding is proved to provide optimal solutions to an extended set of hiding problems compared to previous approaches and to provide solutions of higher quality. Moreover, the application of binary integer programming enables the simultaneous hiding of the sensitive itemsets and thus allows for the identification of globally optimal solutions.

INDEX TERMS

Data mining, Mining methods and algorithms

CITATION

Aris Gkoulalas-Divanis, Vassilios S. Verykios, "Exact Knowledge Hiding through Database Extension",

*IEEE Transactions on Knowledge & Data Engineering*, vol.21, no. 5, pp. 699-713, May 2009, doi:10.1109/TKDE.2008.199REFERENCES

- [2] C.C. Aggarwal and P.S. Yu, “On Variable Constraints in Privacy Preserving Data Mining,”
Proc. SIAM Int'l Conf. Data Mining (SDM), 2005.- [3] C. Clifton and D. Marks, “Security and Privacy Implications of Data Mining,”
Proc. 1996 ACM SIGMOD Int'l Workshop Data Mining and Knowledge Discovery, pp. 15-19, Feb. 1996.- [6] C.C. Aggarwal and P.S. Yu,
Privacy Preserving Data Mining: Models and Algorithms (Advances in Database Systems). Springer-Verlag, 2008.- [8] M. Atallah, E. Bertino, A. Elmagarmid, M. Ibrahim, and V.S. Verykios, “Disclosure Limitation of Sensitive Rules,”
Proc. IEEE Knowledge and Data Eng. Exchange Workshop (KDEX '99), pp. 45-52, 1999.- [10] S.R.M. Oliveira and O.R. Zaïane, “Protecting Sensitive Knowledge by Data Sanitization,”
Proc. Third IEEE Int'l Conf. Data Mining (ICDM '03), pp. 211-218, 2003.- [12] G.V. Moustakides and V.S. Verykios, personal comm., 2006.
- [17] X. Sun and P.S. Yu, “A Border-Based Approach for Hiding Sensitive Frequent Itemsets,”
Proc. Fifth IEEE Int'l Conf. Data Mining (ICDM '05), pp. 426-433, 2005.- [18] X. Sun and P.S. Yu, “Hiding Sensitive Frequent Itemsets by a Border-Based Approach,”
Computing Science and Eng., vol. 1, no. 1, pp. 74-94, 2007.- [19] T. Mielikainen, “On Inverse Frequent Set Mining,”
Proc. Second IEEE ICDM Workshop Privacy Preserving Data Mining (PPDM '03), pp. 18-23, 2003.- [21] X. Chen, M. Orlowska, and X. Li, “A New Framework of Privacy Preserving Data Sharing,”
Proc. Fourth IEEE Int'l Workshop Privacy and Security Aspects of Data Mining (PSDM '04), pp. 47-56, 2004.- [22] X. Wu, Y. Wu, Y. Wang, and Y. Li, “Privacy-Aware Market Basket Data Set Generation: A Feasible Approach for Inverse Frequent Set Mining,”
Proc. SIAM Int'l Conf. Data Mining (SDM), 2005.- [23] J.M. Mateo-Sanz, A. Martínez-Ballesté, and J. Domingo-Ferrer, “Fast Generation of Accurate Synthetic Microdata,”
Proc. Int'l Workshop Privacy in Statistical Databases (PSD '04), pp. 298-306, 2004.- [24] R.A. Dandekar, J. Domingo-Ferrer, and F. Sebe, “Lhs-Based Hybrid Microdata versus Rank Swapping and Microaggregation for Numeric Microdata Protection,”
Inference Control in Statistical Databases, from Theory to Practice, pp. 153-162, 2002.- [25] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules in Large Databases,”
Proc. 20th Int'l Conf. Very Large Databases (VLDB '94), pp. 487-499, 1994.- [26] S. Russell and P. Norvig,
Artificial Intelligence: A Modern Approach, second ed. Prentice-Hall, 2003.- [27] C. Gueret, C. Prins, and M. Sevaux,
Applications of Optimization with Xpress-MP. Dash Optimization, 2002.- [29] M. Yokoo and K. Hirayama, “Algorithms for Distributed Constraint Satisfaction: A Review,”
Autonomous Agents and Multi-Agent Systems, vol. 3, no. 2, pp. 185-207, 2000.- [31] R. Kohavi, C. Brodley, B. Frasca, L. Mason, and Z. Zheng, “KDD-Cup 2000 Organizers' Report: Peeling the Onion,”
SIGKDD Explorations, vol. 2, no. 2, pp. 86-98, http://www.ecn.purdue.eduKDDCUP, 2000.- [32] R. Bayardo, “Efficiently Mining Long Patterns from Databases,”
Proc. ACM SIGMOD, 1998.- [33] I. Ilog,
CPLEX 9.0 User's Manual, Mountain View, CA, http:/www.ilog.com/, Oct. 2005. |