This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
A Tree-Based Data Perturbation Approach for Privacy-Preserving Data Mining
September 2006 (vol. 18 no. 9)
pp. 1278-1283
Sumit Sarkar, IEEE Computer Society
Due to growing concerns about the privacy of personal information, organizations that use their customers' records in data mining activities are forced to take actions to protect the privacy of the individuals. A frequently used disclosure protection method is data perturbation. When used for data mining, it is desirable that perturbation preserves statistical relationships between attributes, while providing adequate protection for individual confidential data. To achieve this goal, we propose a kd-tree based perturbation method, which recursively partitions a data set into smaller subsets such that data records within each subset are more homogeneous after each partition. The confidential data in each final subset are then perturbed using the subset average. An experimental study is conducted to show the effectiveness of the proposed method.

[1] N.R. Adam and J.C. Wortmann, “Security-Control Methods for Statistical Databases: A Comparative Study,” ACM Computing Surveys, vol. 21, no. 4, pp. 515-556, 1989.
[2] R. Agrawal and R. Srikant, “Privacy-Preserving Data Mining,” Proc. 2000 ACM SIGMOD Int'l Conf. Management of Data, pp. 439-450, 2000.
[3] C.C. Aggarwal and P.S. Yu, “A Condensation Approach to Privacy Preserving Data Mining,” Proc. Ninth Int'l Conf. Extending Database Technology, pp. 183-199, 2004.
[4] E.R. Berndt, The Practice of Econometrics. New York: Addison-Wesley, 1991.
[5] R. Brand, J. Domingo-Ferrer, and J.M. Mateo-Sanz, “Reference Data Sets to Test and Compare SDC Methods for Protection of Numerical Microdata,” Apr. 2002, http://neon.vb.cbs.nlcasc/.
[6] C. Clifton, M. Kantarcioglu, J. Vaidya, X. Lin, and M. Zhu, “Tools for Privacy Preserving Distributed Data Mining,” SIGKDD Explorations, vol. 4, no. 2, pp. 38-44, 2002.
[7] D. Defays and P. Nanopoulos, “Panels of Enterprises and Confidentiality: The Small Aggregates Method,” Proc. Statistics Canada Symp. 92 Design and Analysis of Longitudinal Surveys, pp. 195-204, 1993.
[8] J. Domingo-Ferrer and J.M. Mateo-Sanz, “Practical Data-Oriented Microaggregation for Statistical Disclosure Control,” IEEE Trans. Knowledge and Data Eng., vol. 14, no. 1, pp. 189-201, 2002.
[9] J. Domingo-Ferrer and V. Torra, “A Quantitative Comparison of Disclosure Control Methods for Microdata,” Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, P. Doyle, J. Lane, J. Theeuwes, and L. Zayatz, eds., pp. 111-134, Amsterdam: North-Holland, 2001.
[10] J. Domingo-Ferrer and V. Torra, “Ordinal, Continuous and Heterogeneous k-Anonymity through Microaggregation,” Data Mining and Knowledge Discovery, vol. 11, no. 2, pp. 195-212, 2005.
[11] G. T. Duncan and S. Mukherjee, “Optimal Disclosure Limitation Strategy in Statistical Databases: Deterring Tracker Attacks through Additive Noise,” J. Am. Statistical Assoc., vol. 95, no. 451, pp. 720-729, 2000.
[12] A. Evfimievski, R. Srikant, R. Agrawal, and J. Gehrke, “Privacy Preserving Mining of Association Rules,” Proc. Eighth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 217-228, 2002.
[13] J.H. Friedman and J.L. Bentley, “An Algorithm for Finding Best Matches in Logarithmic Expected Time,” ACM Trans. Math. Software, vol. 3, no. 3, pp. 209-226, 1977.
[14] D. Galletta, “MIS Faculty Salary Survey,” Mar. 2004, http://www.pitt.edu~galletta/.
[15] R. Gopal, R. Garfinkel, and P. Goes, “Confidentiality via Camouflage: The CVC Approach to Disclosure Limitation when Answering Queries to Databases,” Operations Research, vol. 50, no. 3, pp. 501-516, 2002.
[16] S. L. Hansen and S. Mukherjee, “A Polynomial Algorithm for Optimal Univariate Microaggregation,” IEEE Trans. Knowledge and Data Eng., vol. 15, no. 4, pp. 1043-1044, July/Aug. 2003.
[17] D. Harrison and D.L. Rubinfeld, “Hedonic Prices and the Demand for Clean Air,” J. Environmental Economics and Management, vol. 5, pp. 81-102, 1978.
[18] C.K. Liew, U.J. Choi, and C.J. Liew, “A Data Distortion by Probability Distribution,” ACM Trans. Database Systems, vol. 10, no. 3, pp. 395-411, 1985.
[19] Y. Lindell and B. Pinkas, “Privacy Preserving Data Mining,” J. Cryptology, vol. 15, no. 3, pp. 177-206, 2002.
[20] L. Torgo, “Housing Data,” Oct. 1996, http://www.cs.waikato. ac.nz/mlweka/.
[21] J.F. Traub, Y. Yemini, and H. Wozniakowski, “The Statistical Security of a Statistical Database,” ACM Trans. Database Systems, vol. 9, no. 4, pp. 672-679, 1984.
[22] V.S. Verykios, A.K. Elmagarmid, E. Bertino, Y. Saygin, and E. Dasseni, “Association Rule Hiding,” IEEE Trans. Knowledge and Data Eng., vol. 16, no. 4, pp. 434-447, Apr. 2004.

Index Terms:
Privacy, data mining, data perturbation, microaggregation, kd-trees.
Citation:
Xiao-Bai Li, Sumit Sarkar, "A Tree-Based Data Perturbation Approach for Privacy-Preserving Data Mining," IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 9, pp. 1278-1283, Sept. 2006, doi:10.1109/TKDE.2006.136
Usage of this product signifies your acceptance of the Terms of Use.