This Article 
 Bibliographic References 
 Add to: 
Outsourced Similarity Search on Metric Data Assets
February 2012 (vol. 24 no. 2)
pp. 338-352
Man Lung Yiu, Hong Kong Polytechnic University, Hong Kong
Ira Assent, Aarhus University, Aarhus
Christian S. Jensen, Aarhus University, Aarhus
Panos Kalnis, KAUST, Thuwal
This paper considers a cloud computing setting in which similarity querying of metric data is outsourced to a service provider. The data is to be revealed only to trusted users, not to the service provider or anyone else. Users query the server for the most similar data objects to a query example. Outsourcing offers the data owner scalability and a low-initial investment. The need for privacy may be due to the data being sensitive (e.g., in medicine), valuable (e.g., in astronomy), or otherwise confidential. Given this setting, the paper presents techniques that transform the data prior to supplying it to the service provider for similarity queries on the transformed data. Our techniques provide interesting trade-offs between query cost and accuracy. They are then further extended to offer an intuitive privacy guarantee. Empirical studies with real data demonstrate that the techniques are capable of offering privacy while enabling efficient and accurate processing of similarity queries.

[1] G. Aggarwal, T. Feder, K. Kenthapadi, S. Khuller, R. Panigrahy, D. Thomas, and A. Zhu, "Achieving Anonymity via Clustering," Proc. 25th ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems (PODS), pp. 153-162, 2006.
[2] R. Agrawal, P.J. Haas, and J. Kiernan, "Watermarking Relational Data: Framework, Algorithms and Analysis," The Int'l J. Very Large Data Bases, vol. 12, no. 2, pp. 157-169, 2003.
[3] R. Agrawal, J. Kiernan, R. Srikant, and Y. Xu, "Order-Preserving Encryption for Numeric Data," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 563-574, 2004.
[4] R. Agrawal and R. Srikant, "Privacy-Preserving Data Mining," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 439-450, 2000.
[5] C.A. Ardagna, M. Cremonini, E. Damiani, S.D.C. di Vimercati, and P. Samarati, "Location Privacy Protection Through Obfuscation-Based Techniques," Proc. 21st Ann. IFIP WG 11.3 Working Conf. Data and Applications Security (DBSec), pp. 47-60, 2007.
[6] V. Athitsos, M. Potamias, P. Papapetrou, and G. Kollios, "Nearest Neighbor Retrieval Using Distance-Based Hashing," Proc. IEEE 24th Int'l Conf. Data Eng. (ICDE), pp. 327-336, 2008.
[7] N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger, "The R∗-Tree: An Efficient and Robust Access Method for Points and Rectangles," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 322-331, 1990.
[8] S. Berchtold, D.A. Keim, and H.-P. Kriegel, "The X-Tree : An Index Structure for High-Dimensional Data," Proc. 22nd Int'l Conf. Very Large Databases, pp. 28-39, 1996.
[9] T. Bozkaya and Z.M. Özsoyoglu, "Indexing Large Metric Spaces for Similarity Search Queries," ACM Trans. Database Systems, vol. 24, no. 3, pp. 361-404, 1999.
[10] E. Chávez, G. Navarro, R.A. Baeza-Yates, and J.L. Marroquín, "Searching in Metric Spaces," ACM Computing Surveys, vol. 33, no. 3, pp. 273-321, 2001.
[11] P. Ciaccia, M. Patella, and P. Zezula, "M-Tree: An Efficient Access Method for Similarity Search in Metric Spaces," Proc. Very Large Databases (VLDB), pp. 426-435, 1997.
[12] E. Damiani, S.D.C. Vimercati, S. Jajodia, S. Paraboschi, and P. Samarati, "Balancing Confidentiality and Efficiency in Untrusted Relational DBMSs," Proc. 10th ACM Conf. Computer and Comm. Security (CCS), pp. 93-102, 2003.
[13] M. Dunham, Data Mining: Introductory and Advanced Topics. Prentice Hall, 2002.
[14] C. Faloutsos and K.-I. Lin, "FastMap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Data Sets," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 163-174, 1995.
[15] G. Ghinita, P. Kalnis, A. Khoshgozaran, C. Shahabi, and K.L. Tan, "Private Queries in Location Based Services: Anonymizers Are Not Necessary," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 121-132, 2008.
[16] A. Gionis, P. Indyk, and R. Motwani, "Similarity Search in High Dimensions via Hashing," Proc. 25th Int'l Conf. Very Large Databases (VLDB), pp. 518-529, 1999.
[17] H. Hacigümüs, B.R. Iyer, C. Li, and S. Mehrotra, "Executing SQL over Encrypted Data in the Database-Service-Provider Model," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 216-227, 2002.
[18] H. Hacigümüs, S. Mehrotra, and B.R. Iyer, "Providing Database as a Service," Proc. 18th Int'l Conf. Data Eng. (ICDE), pp. 29-40, 2002.
[19] A. Hinneburg, C.C. Aggarwal, and D.A. Keim, "What Is the Nearest Neighbor in High Dimensional Spaces?," Proc. 26th Int'l Conf. Very Large Data Bases (VLDB), pp. 506-515, 2000.
[20] G.R. Hjaltason and H. Samet, "Index-Driven Similarity Search in Metric Spaces," ACM Trans. Database Systems, vol. 28, no. 4, pp. 517-580, 2003.
[21] H.V. Jagadish, B.C. Ooi, K.-L. Tan, C. Yu, and R.Z. 0003, "iDistance: An Adaptive B$^{+}$ -Tree Based Indexing Method for Nearest Neighbor Search," ACM Trans. Database Systems, vol. 30, no. 2, pp. 364-397, 2005.
[22] C.T. Jr, A.J.M. Traina, B. Seeger, and C. Faloutsos, "Slim-Trees: High Performance Metric Trees Minimizing Overlap between Nodes," Proc. Seventh Int'l Conf. Extending Database Technology (EDBT), pp. 51-65, 2000.
[23] H. Kargupta, S. Datta, Q. Wang, and K. Sivakumar, "On the Privacy Preserving Properties of Random Data Perturbation Techniques," Proc. IEEE Third Int'l Conf. Data Mining (ICDM), pp. 99-106, 2003.
[24] A. Khoshgozaran and C. Shahabi, "Blind Evaluation of Nearest Neighbor Queries Using Space Transformation to Preserve Location Privacy," Proc. 10th Int'l Conf. Advances in Spatial and Temporal Databases (SSTD), pp. 239-257, 2007.
[25] K. LeFevre, D.J. DeWitt, and R. Ramakrishnan, "Mondrian Multidimensional K-Anonymity," Proc. 22nd Int'l Conf. Data Eng. (ICDE), p. 25, 2006.
[26] T. Seidl and H.P. Kriegel, "Optimal Multi-Step k-Nearest Neighbor Search," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 154-165, 1998.
[27] L. Sweeney, "$k$ -Anonymity: A Model for Protecting Privacy," Int'l J. Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 10, no. 5, pp. 557-570, 2002.
[28] W.K. Wong, D.W. Cheung, B. Kao, and N. Mamoulis, "Secure k-NN Computation on Encrypted Databases," Proc. 35th ACM SIGMOD Int'l Conf. Management of Data, pp. 139-152, 2009.
[29] P. Yianilos, "Data Structures and Algorithms for Nearest Neighbor Search in General Metric Spaces," Proc. Fourth Ann. ACM-SIAM Symp. Discrete Algorithms (SODA), pp. 311-321, 1993.
[30] M.L. Yiu, I. Assent, C.S. Jensen, and P. Kalnis, "Outsourced Similarity Search on Metric Data Assets," DB Technical Report TR-28, Aalborg Univ., 2010.
[31] M.L. Yiu, G. Ghinita, C.S. Jensen, and P. Kalnis, "Outsourcing Search Services on Private Spatial Data," Proc. IEEE 25th Int'l Conf. Data Eng. (ICDE), pp. 1140-1143, 2009.

Index Terms:
Query processing, Security, integrity, and protection.
Man Lung Yiu, Ira Assent, Christian S. Jensen, Panos Kalnis, "Outsourced Similarity Search on Metric Data Assets," IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 2, pp. 338-352, Feb. 2012, doi:10.1109/TKDE.2010.222
Usage of this product signifies your acceptance of the Terms of Use.