Subscribe
Issue No.02  February (2011 vol.23)
pp: 161174
Gabriel Ghinita , Purdue University, West Lafayette
Panos Kalnis , King Abdullah University of Science and Technology (KAUST), Jeddah
Yufei Tao , Chinese University of Hong Kong, Hong Kong
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2010.101
ABSTRACT
Existing research on privacypreserving data publishing focuses on relational data: in this context, the objective is to enforce privacypreserving paradigms, such as kanonymity and \elldiversity, while minimizing the information loss incurred in the anonymizing process (i.e., maximize data utility). Existing techniques work well for fixedschema data, with low dimensionality. Nevertheless, certain applications require privacypreserving publishing of transactional data (or basket data), which involve hundreds or even thousands of dimensions, rendering existing methods unusable. We propose two categories of novel anonymization methods for sparse highdimensional data. The first category is based on approximate nearestneighbor (NN) search in highdimensional spaces, which is efficiently performed through localitysensitive hashing (LSH). In the second category, we propose two data transformations that capture the correlation in the underlying data: 1) reduction to a band matrix and 2) Gray encodingbased sorting. These representations facilitate the formation of anonymized groups with low information loss, through an efficient lineartime heuristic. We show experimentally, using reallife data sets, that all our methods clearly outperform existing state of the art. Among the proposed techniques, NNsearch yields superior data utility compared to the band matrix transformation, but incurs higher computational overhead. The data transformation based on Gray code sorting performs best in terms of both data utility and execution time.
INDEX TERMS
Privacy, anonymity, transactional data.
CITATION
Gabriel Ghinita, Panos Kalnis, Yufei Tao, "Anonymous Publication of Sensitive Transactional Data", IEEE Transactions on Knowledge & Data Engineering, vol.23, no. 2, pp. 161174, February 2011, doi:10.1109/TKDE.2010.101
REFERENCES
