2010 IEEE 26th International Conference on Data Engineering (ICDE 2010) (2010)
Long Beach, CA, USA
Mar. 1, 2010 to Mar. 6, 2010
Junqiang Liu , Simon Fraser University, Burnaby, British Columbia V5A 1S6, Canada
Ke Wang , Simon Fraser University, Burnaby, British Columbia V5A 1S6, Canada
Publishing person specific data while protecting privacy is an important problem. Existing algorithms that enforce the privacy principle called l-diversity are heuristic based due to the NP-hardness. Several questions remain open: can we get a significant gain in the data utility from an optimal solution compared to heuristic ones; can we improve the utility by setting a distinct privacy threshold per sensitive value; is it practical to find an optimal solution efficiently for real world datasets. This paper addresses these questions. Specifically, we present a pruning based algorithm for finding an optimal solution to an extended form of the l-diversity problem. The novelty lies in several strong techniques: a novel structure for enumerating all solutions, methods for estimating cost lower bounds, strategies for dynamically arranging the enumeration order and updating lower bounds. This approach can be instantiated with any reasonable cost metric. Experiments on real world datasets show that our algorithm is efficient and improves the data utility.
J. Liu and K. Wang, "On optimal anonymization for l<sup>+</sup>-diversity," 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010)(ICDE), Long Beach, CA, USA, 2010, pp. 213-224.