Subscribe

Issue No.08 - August (2011 vol.23)

pp: 1200-1214

Xiaokui Xiao , Nanyang Technological University, Singapore

Guozhang Wang , Cornell University, Ithaca

Johannes Gehrke , Cornell University, Ithaca

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2010.247

ABSTRACT

Privacy-preserving data publishing has attracted considerable research interest in recent years. Among the existing solutions, \epsilon-differential privacy provides the strongest privacy guarantee. Existing data publishing methods that achieve \epsilon-differential privacy, however, offer little data utility. In particular, if the output data set is used to answer count queries, the noise in the query answers can be proportional to the number of tuples in the data, which renders the results useless. In this paper, we develop a data publishing technique that ensures \epsilon-differential privacy while providing accurate answers for range-count queries, i.e., count queries where the predicate on each attribute is a range. The core of our solution is a framework that applies wavelet transforms on the data before adding noise to it. We present instantiations of the proposed framework for both ordinal and nominal data, and we provide a theoretical analysis on their privacy and utility guarantees. In an extensive experimental study on both real and synthetic data, we show the effectiveness and efficiency of our solution.

INDEX TERMS

Privacy-preserving data publishing, differential privacy, wavelets.

CITATION

Xiaokui Xiao, Guozhang Wang, Johannes Gehrke, "Differential Privacy via Wavelet Transforms",

*IEEE Transactions on Knowledge & Data Engineering*, vol.23, no. 8, pp. 1200-1214, August 2011, doi:10.1109/TKDE.2010.247REFERENCES

- [1] N.R. Adam and J.C. Worthmann, "Security-Control Methods for Statistical Databases: A Comparative Study,"
ACM Computing Surveys, vol. 21, no. 4, pp. 515-556, 1989.- [2] B.C.M. Fung, K. Wang, R. Chen, and P.S. Yu, "Privacy-Preserving Data Publishing: A Survey of Recent Developments,"
ACM Computing Surveys, vol. 42, no. 4, pp. 14:1-53, 2010.- [3] R.C.-W. Wong, A.W.-C. Fu, K. Wang, and J. Pei, "Minimality Attack in Privacy Preserving Data Publishing,"
Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 543-554, 2007.- [4] S.R. Ganta, S.P. Kasiviswanathan, and A. Smith, "Composition Attacks and Auxiliary Information in Data Privacy,"
Proc. 14th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 265-273, 2008.- [5] D. Kifer, "Attacks on Privacy and de Finetti's Theorem,"
Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 127-138, 2009.- [6] C. Dwork, F. McSherry, K. Nissim, and A. Smith, "Calibrating Noise to Sensitivity in Private Data Analysis,"
Proc. Third Theory of Cryptography Conf. (TCC), pp. 265-284, 2006.- [7] E.J. Stollnitz, T.D. Derose, and D.H. Salesin,
Wavelets for Computer Graphics: Theory and Applications. Morgan Kaufmann Publishers, 1996.- [8] K. Chakrabarti, M.N. Garofalakis, R. Rastogi, and K. Shim, "Approximate Query Processing Using Wavelets,"
The VLDB J., vol. 10, nos. 2/3, pp. 199-223, 2001.- [9] M.N. Garofalakis and P.B. Gibbons, "Wavelet Synopses with Error Guarantees,"
Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 476-487, 2002.- [10] V. Iyengar, "Transforming Data to Satisfy Privacy Constraints,"
Proc. Eighth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 279-288, 2002.- [11] G. Ghinita, P. Karras, P. Kalnis, and N. Mamoulis, "Fast Data Anonymization with Low Information Loss,"
Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 758-769, 2007.- [12] J. Gray, A. Bosworth, A. Layman, and H. Pirahesh, "Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total,"
Proc. 12th IEEE Int'l Conf. Data Eng. (ICDE), pp. 152-159, 1996.- [13] D. Donoho and I. Johnstone, "Ideal Spatial Adaptation via Wavelet Shrinkage,"
Biometrika, vol. 81, pp. 425-455, 1994.- [14] M. Elad, "Why Simple Shrinkage is Still Relevant for Redundant Representations?"
IEEE Trans. Information Theory, vol. 52, no. 12, pp. 5559-5569, Dec. 2006.- [15] A. Chambolle, R.A. DeVore, N.-Y. Lee, and B.J. Lucier, "Nonlinear Wavelet Image Processing: Variational Problems, Compression, and Noise Removal through Wavelet Shrinkage,"
IEEE Trans. Image Processing, vol. 7, no. 3, pp. 319-335, Mar. 1998.- [16] S.G. Chang, B. Yu, and M. Vetterli, "Adaptive Wavelet Thresholding for Image Denoising and Compression,"
IEEE Trans. Image Processing, vol. 9, no. 9, pp. 1532-1546, Sept. 2000.- [17] D. Donoho and I. Johnstone, "Adapting to Unknown Smoothness via Wavelet Shrinkage,"
J. Am. Statistical Assoc., vol. 90, pp. 1200-1224, 1995.- [18] S.G. Chang, B. Yu, and M. Vetterli, "Spatially Adaptive Wavelet Thresholding with Context Modeling for Image Denoising,"
IEEE Trans. Image Processing, vol. 9, no. 9, pp. 1522-1531, Sept. 2000.- [19] A. Machanavajjhala, D. Kifer, J.M. Abowd, J. Gehrke, and L. Vilhuber, "Privacy: Theory Meets Practice on the Map,"
Proc. 24th IEEE Int'l Conf. Data Eng. (ICDE), pp. 277-286, 2008.- [20] A. Korolova, K. Kenthapadi, N. Mishra, and A. Ntoulas, "Releasing Search Queries and Clicks Privately,"
Proc. Int'l Conf. World Wide Web (WWW), pp. 171-180, 2009.- [21] M. Götz, A. Machanavajjhala, G. Wang, X. Xiao, and J. Gehrke, "Publishing Search Logs - A Comparative Study of Privacy Guarantees," to be published in
IEEE Trans. Knowledge and Data Eng.- [22] Minnesota Population Center, "Integrated Public Use Microdata Series—International: Version 5.0," https:/international.ipums. org, 2009.
- [23] J.S. Vitter and M. Wang, "Approximate Computation of Multidimensional Aggregates of Sparse Data Using Wavelets,"
Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 193-204, 1999.- [24] M.N. Garofalakis and A. Kumar, "Wavelet Synopses for General Error Metrics,"
ACM Trans. Database Systems, vol. 30, no. 4, pp. 888-928, 2005.- [25] K. Chaudhuri and C. Monteleoni, "Privacy-Preserving Logistic Regression,"
Proc. 22nd Ann. Conf. Neural Information Processing Systems (NIPS), pp. 289-296, 2008.- [26] A. Blum, K. Ligett, and A. Roth, "A Learning Theory Approach to Non-Interactive Database Privacy,"
Proc. 40th Ann. ACM Symp. Theory of Computing (STOC), pp. 609-618, 2008.- [27] S.P. Kasiviswanathan, H.K. Lee, K. Nissim, S. Raskhodnikova, and A. Smith, "What Can We Learn Privately?"
Proc. 49th Ann. IEEE Symp. Foundations of Computer Science (FOCS), pp. 531-540, 2008.- [28] K. Nissim, S. Raskhodnikova, and A. Smith, "Smooth Sensitivity and Sampling in Private Data Analysis,"
Proc. 39th Ann. ACM Symp. Theory of Computing (STOC), pp. 75-84, 2007.- [29] B. Barak, K. Chaudhuri, C. Dwork, S. Kale, F. McSherry, and K. Talwar, "Privacy, Accuracy, and Consistency Too: A Holistic Solution to Contingency Table Release,"
Proc. 26th ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems (PODS), pp. 273-282, 2007.- [30] M. Hay, V. Rastogi, G. Miklau, and D. Suciu, "Boosting the Accuracy of Differentially-Private Queries through Consistency,"
Proc. VLDB Endowment, vol. 3, no. 1, pp. 1021-1032, 2010.- [31] C. Li, M. Hay, V. Rastogi, G. Miklau, and A. McGregor, "Optimizing Linear Counting Queries under Differential Privacy,"
Proc. 29th ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems (PODS), pp. 123-134, 2010.- [32] A. Ghosh, T. Roughgarden, and M. Sundararajan, "Universally Utility-Maximizing Privacy Mechanisms,"
Proc. Ann. ACM Symp. Theory of Computing (STOC), pp. 351-360, 2009.- [33] X. Xiao, G. Wang, and J. Gehrke, "Differential Privacy via Wavelet Transforms,"
Proc. 26th IEEE Int'l Conf. Data Eng. (ICDE), pp. 225-236, 2010. |