Issue No. 03 - March (2010 vol. 22)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.88
Thomas A. Lasko , Google, Inc., Mountain View
Staal A. Vinterbo , Brigham and Women's Hospital, Boston
The goal of data anonymization is to allow the release of scientifically useful data in a form that protects the privacy of its subjects. This requires more than simply removing personal identifiers from the data because an attacker can still use auxiliary information to infer sensitive individual information. Additional perturbation is necessary to prevent these inferences, and the challenge is to perturb the data in a way that preserves its analytic utility. No existing anonymization algorithm provides both perfect privacy protection and perfect analytic utility. We make the new observation that anonymization algorithms are not required to operate in the original vector-space basis of the data, and many algorithms can be improved by operating in a judiciously chosen alternate basis. A spectral basis derived from the data's eigenvectors is one that can provide substantial improvement. We introduce the term spectral anonymization to refer to an algorithm that uses a spectral basis for anonymization, and give two illustrative examples. We also propose new measures of privacy protection that are more general and more informative than existing measures, and a principled reference standard with which to define adequate privacy protection.
Privacy, computational disclosure control, machine learning.
T. A. Lasko and S. A. Vinterbo, "Spectral Anonymization of Data," in IEEE Transactions on Knowledge & Data Engineering, vol. 22, no. , pp. 437-446, 2009.