Subscribe
Issue No.11 - November (2009 vol.21)
pp: 1643-1647
Michael Laszlo , Nova Southeastern University, Fort Lauderdale
Sumitra Mukherjee , Nova Southeastern University, Fort Lauderdale
ABSTRACT
The NP-hard microaggregation problem seeks a partition of data points into groups of minimum specified size k, so as to minimize the sum of the squared euclidean distances of every point to its group's centroid. One recent heuristic provides an {\rm O}(k^3) guarantee for this objective function and an {\rm O}(k^2) guarantee for a version of the problem that seeks to minimize the sum of the distances of the points to its group's centroid. This paper establishes approximation bounds for another microaggregation heuristic, providing better approximation guarantees of {\rm O}(k^2) for the squared distance measure and {\rm O}(k) for the distance measure.
INDEX TERMS
Data security, disclosure control, microdata protection, microaggregation, k-anonymity, approximation algorithms, graph partitioning, information loss.
CITATION
Michael Laszlo, Sumitra Mukherjee, "Approximation Bounds for Minimum Information Loss Microaggregation", IEEE Transactions on Knowledge & Data Engineering, vol.21, no. 11, pp. 1643-1647, November 2009, doi:10.1109/TKDE.2009.78
REFERENCES
 [1] N.R. Adam and J.C. Wortmann, “Security Control Methods for Statistical Databases: A Comparative Study,” ACM Computing Surveys, vol. 21, no. 4, pp. 515-556, 1989. [2] G. Aggarwal, T. Feder, K. Kenthapadi, R. Motwani, R. Panigrahy, D. Thomas, and A. Zhu, “Approximation Algorithms for k-Anonymity,” J.Privacy Technology, Nov. 2005, http://www.jopt.org/publications20051120001_aggarwal.pdf . [3] R. Bar-Yehuda, K. Bendel, A. Freund, and D. Rawitz, “Local Ratio: A Unified Framework for Approximation Algorithms. In Memoriam: Shimon Even 1935-2004,” ACM Computing Surveys (CSUR), vol. 36, no. 4, pp. 422-463, 2004. [4] J. Domingo-Ferrer, F. Sebé, and A. Solanas, “A Polynomial-Time Approximation to Optimal Multivariate Microaggregation,” Computers and Math. with Applications, vol. 55, no. 4, pp. 714-732, 2008. [5] J. Domingo-Ferrer and J.M. Mateo-Sanz, “Practical Data-Oriented Microaggregation for Statistical Disclosure Control,” IEEE Trans. Knowledge and Data Eng., vol. 14, no. 1, pp. 189-201, Jan./Feb. 2002. [6] J. Domingo-Ferrer and V. Torra, “Ordinal, Continuous and Heterogeneous k-Anonymity through Microaggregation,” Data Mining and Knowledge Discovery, vol. 11, no. 2, pp. 195-212, 2005. [7] J. Domingo-Ferrer, A. Martínez-Ballesté, J. Mateo-Sanz, and F. Sebé, “Efficient Multivariate Data-Oriented Microaggregation,” The VLDB J., vol. 15, no. 4, pp. 355-369, 2006. [8] S. Durocher and D. Kirkpatrick, “The Projection Median of a Set of Points in $R^2$ ,” Proc. 17th Canadian Conf. Computational Geometry (CCCG '05), pp. 47-51, 2005. [9] M.X. Goemans and D.P. Williamson, “Approximating Minimum-Cost Graph Problems with Spanning Tree Edges,” Operations Research Letters, vol. 16, pp. 183-244, 1994. [10] M.X. Goemans and D.P. Williamson, “The Primal-Dual Method for Approximation Algorithms and Its Application to Network Design Problems,” Approximation Algorithms, D. Hochbaum, ed., pp. 144-191, 1997. [11] M.X. Goemans and D.P. Williamson, “A General Approximation Technique for Constrained Forest Problems,” SIAM J. Computing, vol. 24, pp. 296-317, 1995. [12] S.L. Hansen and S. Mukherjee, “A Polynomial Algorithm for Optimal Univariate Microaggregation,” IEEE Trans. Knowledge and Data Eng., vol. 15, no. 4, pp. 1043-1044, July/Aug. 2003. [13] X.B. Li and S. Sarkar, “A Tree-Based Data Perturbation Approach for Privacy-Preserving Data Mining,” IEEE Trans. Knowledge and Data Eng., vol. 18, no. 9, pp. 1278-1283, Sept. 2006. [14] C. Imielińska, B. Kalantari, and L. Khachiyan, “A Greedy Heuristic for a Minimum-Weight Forest Problem,” Operations Research Letters, vol. 14, pp.65-71, 1993. [15] M. Laszlo and S. Mukherjee, “Minimum Spanning Tree Partitioning Algorithm for Microaggregation,” IEEE Trans. Knowledge and Data Eng., vol. 17, no. 7, pp. 902-911, July 2005. [16] M. Laszlo and S. Mukherjee, “A Class of Heuristics for the Constrained Forest Problem,” Discrete Applied Math., vol. 154, no. 1, pp. 6-14, 2006. [17] M. Laszlo and S. Mukherjee, “Another Greedy Heuristic for the Constrained Forest Problem,” Operations Research Letters, vol. 33, no. 6, pp. 629-633, 2005. [18] A. Oganian and J. Domingo-Ferrer, “On the Complexity of Optimal Microaggregation for Statistical Disclosure Control,” Statistical J. United Nations Economic Commission for Europe, vol. 18, no. 4, pp. 345-354, 2001. [19] P. Samarati, “Protecting Respondents' Identities in Microdata Release,” IEEE Trans. Knowledge and Data Eng., vol. 13, no. 6, pp. 1010-1027, Nov./Dec. 2001. [20] L. Sweeney, “K-Anonymity: A Model for Protecting Privacy,” Int'l J. Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 10, no. 5, pp. 557-570, 2002. [21] A. Willenborg and T. De Waal, Elements of Statistical Disclosure Control. Springer-Verlag, 2000.