The Community for Technology Leaders
RSS Icon
Issue No.03 - May/June (2011 vol.8)
pp: 337-352
Barbara Carminati , University of Insubria, Varese
Elena Ferrari , University of Insubria, Varese
Kian-Lee Tan , National University of Singapore, Singapore
Most of the existing privacy-preserving techniques, such as k-anonymity methods, are designed for static data sets. As such, they cannot be applied to streaming data which are continuous, transient, and usually unbounded. Moreover, in streaming applications, there is a need to offer strong guarantees on the maximum allowed delay between incoming data and the corresponding anonymized output. To cope with these requirements, in this paper, we present Continuously Anonymizing STreaming data via adaptive cLustEring (CASTLE), a cluster-based scheme that anonymizes data streams on-the-fly and, at the same time, ensures the freshness of the anonymized data by satisfying specified delay constraints. We further show how CASTLE can be easily extended to handle \ell-diversity. Our extensive performance study shows that CASTLE is efficient and effective w.r.t. the quality of the output data.
Data stream, privacy-preserving data mining, anonymity.
Barbara Carminati, Elena Ferrari, Kian-Lee Tan, "CASTLE: Continuously Anonymizing Data Streams", IEEE Transactions on Dependable and Secure Computing, vol.8, no. 3, pp. 337-352, May/June 2011, doi:10.1109/TDSC.2009.47
[1] R. Agrawal and R. Srikant, "Fast Algorithms for Mining Association Rules in Large Databases," Proc. Int'l Conf. Very Large Databases (VLDB), pp. 478-499, 1994.
[2] R. Agrawal and R. Srikant, "Privacy-Preserving Data Mining," Proc. SIGMOD, pp. 439-450, 2000.
[3] C.C. Aggarwal, "On $k$ -Anonymity and the Curse of Dimensionality," Proc. Int'l Conf. Very Large Databases (VLDB), pp. 901-909, 2005.
[4] C.C. Aggarwal, J. Han, J. Wang, and P.S. Yu, "A Framework for Clustering Evolving Data Streams," Proc. Int'l Conf. Very Large Databases (VLDB), pp. 81-92, 2003.
[5] C.C. Aggarwal and P.S. Yu, "A Condensation Approach to Privacy Preserving Data Mining," Proc. Int'l Conf. Extending Database Technology (EDBT), pp. 183-199, 2004.
[6] G. Aggarwal, T. Feder, K. Kenthapadi, S. Khuller, R. Panigrahy, D. Thomas, and A. Zhu, "Achieving Anonymity via Clustering," Proc. Symp. Principles of Database Systems (PODS), pp. 153-162, 2006.
[7] M. Atzori, "Weak $k$ -Anonymity: A Low-Distortion Model for Protecting Privacy," Proc. Int'l Security Conf., pp. 60-71, 2006.
[8] R.J. Bayardo and R. Agrawal, "Data Privacy through Optimal $k$ -Anonymization," Proc. Int'l Conf. Data Eng. (ICDE), pp. 217-228, 2005.
[9] J.W. Byun, Y. Sohn, E. Bertino, and N. Li, "Secure Anonymization for Incremental Data Sets," Proc. Very Large Databases (VLDB) Workshop Secure Data Management, pp. 48-63, 2006.
[10] J.W. Byun, A. Kamra, E. Bertino, and N. Li, "Efficient $k$ -Anonymization Using Clustering Techniques," Proc. Database Systems for Advanced Applications (DASFAA), pp. 188-200, 2007.
[11] J. Cao, B. Carminati, E. Ferrari, and K.L. Tan, "CASTLE: A Delay-Constrained Scheme for $k\_s$ -Anonymizing Data Streams," Proc. Int'l Conf. Data Eng. (ICDE), Poster Paper, pp. 1376-1378, 2008.
[12] P. Domingos and G. Hulten, "Mining High-Speed Data Streams," Proc. Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 71-80, 2000.
[13] P. Zhang, X. Zhu, and Y. Shi, "Categorizing and Mining Concept Drifting Data Streams," Proc. Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 812-820, 2008.
[14] C. Luo, H. Thakkar, H. Wang, and C. Zaniolo, "A Native Extension of SQL for Mining Data Streams," Proc. SIGMOD, pp. 873-875, 2005.
[15] J. Domingo-Ferrer and V. Torra, "Ordinal, Continuous and Heterogeneous $k$ -Anonymity through Microaggregation," Data Mining and Knowledge Discovery, vol. 11, no. 2, pp. 195-212, 2005.
[16] J. Domingo-Ferrer, F. Sebe, and A. Solanas, "A Polynomial-Time Approximation to Optimal Multivariate Microaggregation," Computers and Math. with Applications, vol. 55, no. 4, pp. 714-732, 2008.
[17] S. Guha, N. Mishra, R. Motwani, and L. O'Callaghan, "Clustering Data Streams," Proc. IEEE Symp. Foundations of Computer Science (FOCS), pp. 359-366, 2000.
[18] B.C.M. Fung, K. Wang, and P.S. Yu, "Top-Down Specialization for Information and Privacy Preservation," Proc. Int'l Conf. Data Eng. (ICDE), pp. 205-216, 2005.
[19] V.S. Iyengar, "Transforming Data to Satisfy Privacy Constraints," Proc. Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 279-288, 2002.
[20] S. Kim and W. Winkler, "Masking Microdata Files," Proc. Section on Survey Research Methods, pp. 114-119, 1995.
[21] K. LeFevre, D.J. DeWitt, and R. Ramakrishnan, "Incognito: Efficient Full-Domain $k$ -Anonymity," Proc. SIGMOD, pp. 49-60, 2005.
[22] K. LeFevre, D.J. DeWitt, and R. Ramakrishnan, "Mondrian Multidimensional K-Anonymity," Proc. Int'l Conf. Data Eng. (ICDE), p. 25, 2006.
[23] N. Li and T. Li, "$t$ -Closeness: Privacy beyond $k$ -Anonymity and $\ell$ -Diversity," Proc. Int'l Conf. Data Eng. (ICDE), pp. 106-115, 2007.
[24] A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam, "L-Diversity: Privacy beyond $k$ -Anonymity," Proc. Int'l Conf. Data Eng. (ICDE), p. 24, 2006.
[25] P. Samarati and L. Sweeney, "Generalizing Data to Provide Anonymity When Disclosing Information," Proc. Symp. Principles of Database Systems (PODS), p. 188, 1998.
[26] J. Pei, J. Xu, Z. Wang, W. Wang, and K. Wang, "Maintaining K-Anonymity against Incremental Updates," Proc. Int'l Conf. Scientific and Statistical Database Management (SSDBM), p. 5, 2007.
[27] L. Qiu, Y. Li, and X. Wu, "Protecting Business Intelligence and Customer Privacy While Outsourcing Data Mining Tasks," Knowledge and Information Systems, vol. 17, no. 1, pp. 99-120, 2008.
[28] L. Sweeney, "Achieving $k$ -Anonymity Privacy Protection Using Generalization and Suppression," Int'l J. Uncertainty, Fuzziness, and Knowledge-Based Systems, vol. 10, pp. 571-588, 2002.
[29] T.M. Truta and A. Campan, "$k$ -Anonymization Incremental Maintenance and Optimization Techniques," Proc. ACM Symp. Applied Computing (SAC), pp. 380-387, 2007.
[30] X. Xiao and Y. Tao, "M-Invariance: Towards Privacy Preserving Re-Publication of Dynamic Data Sets," Proc. SIGMOD, pp. 689-700, 2007.
[31] M. Laszlo and S. Mukherjee, "Approximation Bounds for Minimum Information Loss Microaggregation," IEEE Trans. Knowledge and Data Eng., vol. 21, no. 11, pp. 1643-1647, Nov. 2009.
[32] G. Ghinita, P. Karras, P. Kalnis, and N. Mamoulis, "Fast Data Anonymization with Low Information Loss," Proc. Int'l Conf. Very Large Databases (VLDB), pp. 758-769, 2007.
31 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool