Subscribe
Issue No.11 - November (2011 vol.23)
pp: 1704-1717
Keng-Pei Lin , National Taiwan University, Taipei
ABSTRACT
The support vector machine (SVM) is a widely used tool in classification problems. The SVM trains a classifier by solving an optimization problem to decide which instances of the training data set are support vectors, which are the necessarily informative instances to form the SVM classifier. Since support vectors are intact tuples taken from the training data set, releasing the SVM classifier for public use or shipping the SVM classifier to clients will disclose the private content of support vectors. This violates the privacy-preserving requirements for some legal or commercial reasons. The problem is that the classifier learned by the SVM inherently violates the privacy. This privacy violation problem will restrict the applicability of the SVM. To the best of our knowledge, there has not been work extending the notion of privacy preservation to tackle this inherent privacy violation problem of the SVM classifier. In this paper, we exploit this privacy violation problem, and propose an approach to postprocess the SVM classifier to transform it to a privacy-preserving classifier which does not disclose the private content of support vectors. The postprocessed SVM classifier without exposing the private content of training data is called Privacy-Preserving SVM Classifier (abbreviated as PPSVC). The PPSVC is designed for the commonly used Gaussian kernel function. It precisely approximates the decision function of the Gaussian kernel SVM classifier without exposing the sensitive attribute values possessed by support vectors. By applying the PPSVC, the SVM classifier is able to be publicly released while preserving privacy. We prove that the PPSVC is robust against adversarial attacks. The experiments on real data sets show that the classification accuracy of the PPSVC is comparable to the original SVM classifier.
INDEX TERMS
Privacy-preserving data mining, classification, support vector machines.
CITATION
Keng-Pei Lin, "On the Design and Analysis of the Privacy-Preserving SVM Classifier", IEEE Transactions on Knowledge & Data Engineering, vol.23, no. 11, pp. 1704-1717, November 2011, doi:10.1109/TKDE.2010.193
REFERENCES
[1] M.-S. Chen, J. Han, and P.S. Yu, "Data Mining: An Overview from Database Perspective," IEEE Trans. Knowledge and Data Eng., vol. 8, no. 6, pp. 866-883, Dec. 1996.
[2] R. Agrawal and R. Srikant, "Privacy Preserving Data Mining," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2000.
[3] D. Agrawal and C.C. Aggarwal, "On the Design and Quantification of Privacy Preserving Data Mining Algorithms," Proc. 20th ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems (PODS), 2001.
[4] Y. Lindell and B. Pinkas, "Privacy Preserving Data Mining," J. Cryptology, vol. 15, pp. 177-206, 2002.
[5] C.C. Aggarwal and P.S. Yu, "A Condensation Approach to Privacy Preserving Data Mining," Proc. Ninth Int'l Conf. Extending Database Technology (EDBT), 2004.
[6] V.N. Vapnik, Statistical Learning Theory. John Wiley and Sons, 1998.
[7] C.J.C. Burges, "A Tutorial on Support Vector Machines for Pattern Recognition," Data Mining and Knowledge Discovery, vol. 2, no. 2, pp. 121-167, 1998.
[8] K. Chen and L. Liu, "Privacy Preserving Data Classification with Rotation Perturbation," Proc. Fifth IEEE Int'l Conf. Data Mining (ICDM), 2005.
[9] H. Yu, X. Jiang, and J. Vaidya, "Privacy-Preserving SVM Using Nonlinear Kernels on Horizontally Partitioned Data," Proc. ACM Symp. Applied Computing (SAC), 2006.
[10] H. Yu, J. Vaidya, and X. Jiang, "Privacy-Preserving SVM Classification on Vertically Partitioned Data," Proc. 10th Pacific-Asia Conf. Knowledge Discovery and Data Mining (PAKDD), 2006.
[11] J. Vaidya, H. Yu, and X. Jiang, "Privacy-Preserving SVM Classification," Knowledge and Information Systems, vol. 14, pp. 161-178, 2008.
[12] S. Laur, H. Lipmaa, and T. Mielikäinen, "Cryptographically Private Support Vector Machines," Proc. 12th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), 2006.
[13] HIPAA, Standard for Privacy of Individually Identifiable Health Information, http://www.hhs.gov/ocr/privacyindex.html , 2001.
[14] J.R. Quinlan, C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
[15] J. Han and M. Kamber, Data Mining: Concepts and Techniques. Morgan Kaufmann, 2006.
[16] B. Mozafari and C. Zaniolo, "Publishing Naive Bayesian Classifiers: Privacy without Accuracy Loss," Proc. 35th Int'l Conf. Very Large Data Bases (VLDB), 2009.
[17] L. Sweeney, "Uniqueness of Simple Demographics in the US Population," LIDAP-WP4, Carnegie Mellon Univ., Laboratory for Int'l Data Privacy, 2000.
[18] L. Sweeney, "Achieving $k$ -Anonymity Privacy Protection Using Generalization and Suppression," Int'l J. Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 10, no. 5, pp. 571-588, 2002.
[19] A. Inan, M. Kantarcioglu, and E. Bertino, "Using Anonymized Data for Classification," Proc. 25th IEEE Int'l Conf. Data Eng. (ICDE), 2009.
[20] A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam, "$l$ -Diversity: Privacy beyond $k$ -Anonymity," Proc. 22nd IEEE Int'l Conf. Data Eng. (ICDE), 2006.
[21] B. Pinkas, "Cryptographic Techniques for Privacy-preserving Data Mining," ACM SIGKDD Explorations Newsletter, vol. 4, no. 2, pp. 12-19, 2002.
[22] B. Schölkopf and A.J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. The MIT Press, 2002.
[23] A.J. Smola, B. Schölkopf, and K.-R. Müller, "The Connection between Regularization Operators and Support Vector Kernels," Neural Networks, vol. 11, pp. 637-649, 1998.
[24] R.P. Grimaldi, Discrete and Combinatorial Mathematics: An Applied Introduction. Pearson Education, 2004.
[25] Y.-W. Chen and C.-J. Lin, "Combining SVMs with Various Feature Selection Strategies," Feature Extraction, Foundations and Applications, Studies in Fuzziness and Soft Computing, vol. 207, Springer-Verlag, 2006.
[26] C.-W. Hsu, C.-C. Chang, and C.-J. Lin, "A Practical Guide to Support Vector Classification," technical report, Dept. of Computer Science, Nat'l Taiwan Univ., http://www.csie.ntu.edu.tw/~cjlin/papers/ guideguide.pdf, 2003.
[27] C.-C. Chang and C.-J. Lin, "LIBSVM: A Library for Support Vector Machines," http://www.csie.ntu.edu.tw/cjlinlibsvm, 2001.
[28] A. Asuncion and D. Newman, "UCI Machine Learning Repository," http://www.ics.uci.edu/mlearnMLRepository.html , 2007.