The Community for Technology Leaders
2015 13th Annual Conference on Privacy, Security and Trust (PST) (2015)
Izmir, Turkey
July 21, 2015 to July 23, 2015
ISBN: 978-1-4673-7827-7
pp: 28-35
Josep Domingo-Ferrer , Universitat Rovira i Virgili, Dept. of Computer Engineering and Maths, UNESCO Chair in Data Privacy, Av. Països Catalans 26, 43007 Tarragona, Catalonia
Sara Ricci , Universitat Rovira i Virgili, Dept. of Computer Engineering and Maths, UNESCO Chair in Data Privacy, Av. Països Catalans 26, 43007 Tarragona, Catalonia
Jordi Soria-Comas , Universitat Rovira i Virgili, Dept. of Computer Engineering and Maths, UNESCO Chair in Data Privacy, Av. Països Catalans 26, 43007 Tarragona, Catalonia
ABSTRACT
Before releasing an anonymized data set, the data protector must know how safe the data set is, that is, how much disclosure risk is incurred by the release. If no privacy model is used to select specific privacy guarantees prior to anonymization, posterior disclosure risk assessment must be performed based on the anonymized data set and, if the result is not satisfactory, anonymization must be repeated with stricter privacy parameters. Even if a privacy model is used, it may still be advisable to empirically evaluate disclosure on the anonymized data set, especially if the privacy model parameters have been relaxed to improve data utility. Record linkage is a general methodology to posterior disclosure risk assessment, whereby the data protector attempts to recreate the attacker's re-identification scenario. An important limitation of record linkage is that it usually requires the data protector to make restrictive assumptions on the attacker's background knowledge. To overcome this limitation, we present a maximum-knowledge attacker model and then we specify and compare several record linkage tests for such a worst-case attacker. Our tests are based on comparing the distribution of linkage distances between the original and the anonymized data set with the distribution of distances between one of the two previous data sets and one random data set. The more similar the distributions, the more plausibly deniable are record linkages claimed by an attacker. Because attaining zero disclosure risk for all records is too costly in terms of utility, a less demanding alternative is presented whose goal is to reduce the maximum per-record disclosure risk.
INDEX TERMS
Couplings, Noise, Risk management, Dictionaries, Data privacy, Privacy, Data models
CITATION
Josep Domingo-Ferrer, Sara Ricci, Jordi Soria-Comas, "Disclosure risk assessment via record linkage by a maximum-knowledge attacker", 2015 13th Annual Conference on Privacy, Security and Trust (PST), vol. 00, no. , pp. 28-35, 2015, doi:10.1109/PST.2015.7232951
100 ms
(Ver 3.3 (11022016))