This Article 
 Bibliographic References 
 Add to: 
JS-Reduce: Defending Your Data from Sequential Background Knowledge Attacks
May/June 2012 (vol. 9 no. 3)
pp. 387-400
Daniele Riboni, University of Milan, Milano
Linda Pareschi, University of Milan, Milano
Claudio Bettini, University of Milan, Milano
Web queries, credit card transactions, and medical records are examples of transaction data flowing in corporate data stores, and often revealing associations between individuals and sensitive information. The serial release of these data to partner institutions or data analysis centers in a nonaggregated form is a common situation. In this paper, we show that correlations among sensitive values associated to the same individuals in different releases can be easily used to violate users' privacy by adversaries observing multiple data releases, even if state-of-the-art privacy protection techniques are applied. We show how the above sequential background knowledge can be actually obtained by an adversary, and used to identify with high confidence the sensitive values of an individual. Our proposed defense algorithm is based on Jensen-Shannon divergence; experiments show its superiority with respect to other applicable solutions. To the best of our knowledge, this is the first work that systematically investigates the role of sequential background knowledge in serial release of transaction data.

[1] R. Agrawal and R. Srikant, "Mining Sequential Patterns," Proc. 11th Int'l Conf. Data Eng. (ICDE '95), pp. 3-14, 1995.
[2] Y. Bu, A. Wai, C. Fu, R.C.W. Wong, L. Chen, and J. Li, "Privacy Preserving Serial Data Publishing by Role Composition," Proc. VLDB Endowment, vol. 1, pp. 845-856, 2008.
[3] J.-W. Byun, Y. Sohn, E. Bertino, and N. Li, "Secure Anonymization for Incremental Data Sets," Proc. Third VLDB Workshop Secure Data Management (SDM '06), pp. 48-63, 2006.
[4] J. Cao, B. Carminati, E. Ferrari, and K.-L. Tan, "CASTLE: Continuously Anonymizing Data Streams," IEEE Trans. Dependable and Secure Computing, vol. 8, no. 3, pp. 337-352, May-June 2011.
[5] T.-H. Hubert Chan, E. Shi, and D. Song, "Private and Continual Release of Statistics," Proc. 37th Int'l Colloquium Conf. Automata, Languages and Programming (ICALP '10), pp. 405-417, 2010.
[6] W. Du, Z. Teng, and Z. Zhu, "Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '08), pp. 459-472, 2008.
[7] C. Dwork, "Differential Privacy," Proc. 33rd Int'l Colloquium on Automata, Languages and Programming (ICALP '06), pp. 1-12, 2006.
[8] C. Dwork, M. Naor, T. Pitassi, and G.N. Rothblum, "Differential Privacy under Continual Observation," Proc. 42nd ACM Symp. Theory of Computing (STOC '10), pp. 715-724, 2010.
[9] G. Di Biase et al., "A Stochastic Model for the HIV/AIDS Dynamic Evolution," Math. Problems in Eng., 2007.
[10] J.-L. Fuh, R.-F. Pwu, S.-J. Wang, and Y.-H. Chen, "Measuring Alzheimer' s Disease Progression with Transition Probabilities in the Taiwanese Population," Int'l J. Geriatric Psychiatry, vol. 19, no. 3, pp. 266-270, 2004.
[11] G. Ghinita, P. Karras, P. Kalnis, and N. Mamoulis, "Fast Data Anonymization with Low Information Loss," Proc. 33rd Int'l Conf. Very Large Data Bases (VLDB '07), pp. 758-769, 2007.
[12] D. Kifer and A. Machanavajjhala, "No Free Lunch in Data Privacy," Proc. Int'l Conf. Management of Data (SIGMOD '11), pp. 193-204, 2011.
[13] K. LeFevre, D.J. DeWitt, and R. Raghu, "Mondrian Multidimensional $k$ -Anonymity," Proc. 22nd Int'l Conf. Data Eng. (ICDE '06), 2006.
[14] J. Li, B.C. Ooi, and W. Wang, "Anonymizing Streaming Data for Privacy Protection," Proc. IEEE 24th Int'l Conf. Data Eng. (ICDE '08), pp. 1367-1369, 2008.
[15] N. Li, T. Li, and S. Venkatasubramanian, "$t$ -Closeness: Privacy beyond $k$ -Anonymity and $l$ -Diversity," Proc. IEEE 23rd Int'l Conf. Data Eng. (ICDE '07), pp. 106-115, 2007.
[16] T. Li and N. Li, "Injector: Mining Background Knowledge for Data Anonymization," Proc. IEEE 24th Int'l Conf. Data Eng. (ICDE '08), pp. 446-455, 2008.
[17] T. Li, N. Li, and J. Zhang, "Modeling and Integrating Background Knowledge in Data Anonymization," Proc. IEEE 25th Int'l Conf. Data Eng. (ICDE '09), pp. 6-17, 2009.
[18] J. Lin, "Divergence Measures based on the Shannon Entropy," IEEE Trans. Information Theory, vol. 37, no. 1, pp. 145-151, Jan. 1991.
[19] A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam, "$l$ -Diversity: Privacy Beyond $k$ -Anonymity," ACM Trans. Knowledge Discovery from Data, vol. 1, no. 1,article 3, 2007.
[20] D.J. Martin, D. Kifer, A. Machanavajjhala, J. Gehrke, and J.Y. Halpern, "Worst-Case Background Knowledge for Privacy-Preserving Data Publishing," Proc. 23rd Int'l Conf. Data Eng. (ICDE '07), pp. 126-135, 2007.
[21] A. Meyerson and R. Williams, "On the Complexity of Optimal $k$ -Anonymity," Proc. 23rd ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems (PODS '04), pp. 223-228, 2004.
[22] M.S. Rangel-Frausto, D. Pittet, T. Hwang, R.F. Woolson, and R.P. Wenzel, "The Dynamics of Disease Progression in Sepsis: Markov Modeling Describing the Natural History and the Likely Impact of Effective Antisepsis Agents," Clinical Infectious Diseases, vol. 27, no. 1, pp. 185-190, 1998.
[23] R.S. Remis, "A Study to Characterize the Epidemiology of Hepatitis C Infection in Canada," technical report, Health Agency of Canada, 2002.
[24] P. Samarati, "Protecting Respondents' Identities in Microdata Release," IEEE Trans. Knowledge Data Eng., vol. 13, no. 6, pp. 1010-1027, Nov./Dec. 2001.
[25] R. Chi-Wing Wong, A. Wai-Chee Fu, J. Liu, K. Wang, and Y. Xu, "Global Privacy Guarantee in Serial Data Publishing," Proc. 26th Int'l Conf. Data Eng. (ICDE '10), pp. 956-959, 2010.
[26] X. Xiao and Y. Tao, "$m$ -Invariance: Towards Privacy Preserving Re-Publication of Dynamic Data Sets," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '07), pp. 689-700, 2007.
[27] J. Xu, W. Wang, J. Pei, X. Wang, B. Shi, and A. Wai-Chee Fu, "Utility-Based Anonymization Using Local Recoding," Proc. 12th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD' 06), pp. 785-790, 2006.

Index Terms:
Privacy-preserving release of transaction data, anonymity, sequential background knowledge.
Daniele Riboni, Linda Pareschi, Claudio Bettini, "JS-Reduce: Defending Your Data from Sequential Background Knowledge Attacks," IEEE Transactions on Dependable and Secure Computing, vol. 9, no. 3, pp. 387-400, May-June 2012, doi:10.1109/TDSC.2012.19
Usage of this product signifies your acceptance of the Terms of Use.