The Community for Technology Leaders
2013 IEEE 29th International Conference on Data Engineering (ICDE) (2010)
Long Beach, CA, USA
Mar. 1, 2010 to Mar. 6, 2010
ISBN: 978-1-4244-5445-7
pp: 1081-1092
Miguel Durazo , University of Sonora, Mexico
Bin Zhang , Hewlett-Packard Laboratories, Palo Alto, CA, USA
Umeshwar Dayal , Hewlett-Packard Laboratories, Palo Alto, CA, USA
Malu Castellanos , Hewlett-Packard Laboratories, Palo Alto, CA, USA
Perla Ruiz , University of Sonora, Mexico
Lily Jow , BIO, HSS, Hewlett-Packard, Cupertino, CA, USA
Ivo Jimenez , Hewlett-Packard Laboratories, Palo Alto, CA, USA
ABSTRACT
Improving the performance and functionality of database system optimizers requires experimentation on real customer data. Often these data are of sensitive nature and the only way to keep them is by applying a non-reversible transformation to obfuscate them. However, in order that the database optimizer generates exactly the same query plans as for the sensitive data, the transformation has to preserve the order and some important properties of the data distribution. Unfortunately, existing data obfuscation techniques do not preserve all of these properties and therefore are not applicable in this context. In this paper we present a Desensitizer tool that we have developed for optimizer performance experiments of HP's Neoview high availability data warehousing product. The tool is based on novel numeric and string desensitization algorithms which are agnostic to the database system. We explain the core concepts behind the algorithms, how they preserve the required data properties and important implementation considerations that were made. We present the architecture of the Desensitizer tool and results of the extensive validation that we conducted.
INDEX TERMS
CITATION
Miguel Durazo, Bin Zhang, Umeshwar Dayal, Malu Castellanos, Perla Ruiz, Lily Jow, Ivo Jimenez, "Data desensitization of customer data for use in optimizer performance experiments", 2013 IEEE 29th International Conference on Data Engineering (ICDE), vol. 00, no. , pp. 1081-1092, 2010, doi:10.1109/ICDE.2010.5447793
100 ms
(Ver )