The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.03 - Third Quarter (2012 vol.5)
pp: 373-386
Benjamin C.M. Fung , Concordia University, Montreal
Thomas Trojer , University of Innsbruck, Innsbruck
Patrick C.K. Hung , University of Ontario Institute of Technology (UOIT), Oshawa
Li Xiong , Emory University, Atlanta
Khalil Al-Hussaeni , Concordia University, Montreal
Rachida Dssouli , Concordia University, Montreal and United Arab Emirate University
ABSTRACT
Mashup is a web technology that allows different service providers to flexibly integrate their expertise and to deliver highly customizable services to their customers. Data mashup is a special type of mashup application that aims at integrating data from multiple data providers depending on the user's request. However, integrating data from multiple sources brings about three challenges: 1) Simply joining multiple private data sets together would reveal the sensitive information to the other data providers. 2) The integrated (mashup) data could potentially sharpen the identification of individuals and, therefore, reveal their person-specific sensitive information that was not available before the mashup. 3) The mashup data from multiple sources often contain many data attributes. When enforcing a traditional privacy model, such as K-anonymity, the high-dimensional data would suffer from the problem known as the curse of high dimensionality, resulting in useless data for further data analysis. In this paper, we study and resolve a privacy problem in a real-life mashup application for the online advertising industry in social networks, and propose a service-oriented architecture along with a privacy-preserving data mashup algorithm to address the aforementioned challenges. Experiments on real-life data suggest that our proposed architecture and algorithm is effective for simultaneously preserving both privacy and information utility on the mashup data. To the best of our knowledge, this is the first work that integrates high-dimensional data for mashup service.
INDEX TERMS
Mashups, Data privacy, Privacy, Couplings, Data models, Companies, Social network services, high dimensionality., Privacy protection, anonymity, data mashup, data integration, service-oriented architecture
CITATION
Benjamin C.M. Fung, Thomas Trojer, Patrick C.K. Hung, Li Xiong, Khalil Al-Hussaeni, Rachida Dssouli, "Service-Oriented Architecture for High-Dimensional Private Data Mashup", IEEE Transactions on Services Computing, vol.5, no. 3, pp. 373-386, Third Quarter 2012, doi:10.1109/TSC.2011.13
REFERENCES
[1] R.D. Hof, "Mix, Match, and Mutate," Business Week, July 2005.
[2] J.R. Quinlan, C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
[3] P. Samarati, "Protecting Respondents' Identities in Microdata Release," IEEE Trans. Knowledge and Data Eng., vol. 13, no. 6, pp. 1010-1027, Nov. 2001.
[4] L. Sweeney, "Achieving $k$ -Anonymity Privacy Protection Using Generalization and Suppression," Int'l J. Uncertainty, Fuzziness, and Knowledge-Based Systems, vol. 10, no. 5, pp. 571-588, 2002.
[5] A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam, "$\ell$ -Diversity: Privacy Beyond $k$ -Anonymity," ACM Trans. Knowledge Discovery from Data, vol. 1, no. 1, Mar. 2007.
[6] K. Wang, B.C.M. Fung, and P.S. Yu, "Handicapping Attacker's Confidence: An Alternative to k-Anonymization," Knowledge and Information Systems, vol. 11, no. 3, pp. 345-368, Apr. 2007.
[7] R.J. Bayardo and R. Agrawal, "Data Privacy through Optimal K-Anonymization," Proc. IEEE 21st Int'l Conf. Data Eng. (ICDE), pp. 217-228, 2005.
[8] B.C.M. Fung, K. Wang, and P.S. Yu, "Anonymizing Classification Data for Privacy Preservation," IEEE Trans. Knowledge and Data Eng., vol. 19, no. 5, pp. 711-725, May 2007.
[9] V.S. Iyengar, "Transforming Data to Satisfy Privacy Constraints," Proc. Eighth ACM Int'l Conf. Knowledge Discovery and Data Mining (SIGKDD), pp. 279-288, July 2002.
[10] K. LeFevre, D.J. DeWitt, and R. Ramakrishnan, "Workload-Aware Anonymization," Proc. 12th ACM Int'l Conf. Knowledge Discovery and Data Mining (SIGKDD), Aug. 2006.
[11] C.C. Aggarwal, "On $k$ -Anonymity and the Curse of Dimensionality," Proc. 31st Very Large Data Bases, pp. 901-909, 2005.
[12] B.C.M. Fung, K. Wang, R. Chen, and P.S. Yu, "Privacy-Preserving Data Publishing: A Survey of Recent Developments," ACM Computing Surveys, vol. 42, no. 4, pp. 14:1-14:53, June 2010.
[13] N. Mohammed, B.C.M. Fung, P.C.K. Hung, and C. Lee, "Anonymizing Healthcare Data: A Case Study on the Blood Transfusion Service," Proc. 15th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 1285-1294, June 2009.
[14] N. Mohammed, B.C.M. Fung, P.C.K. Hung, and C.K. Lee, "Centralized and Distributed Anonymization for High-Dimensional Healthcare Data," ACM Trans. Knowledge Discovery from Data, vol. 4, no. 4, pp. 18:1-18:33, Oct. 2010.
[15] A. Jhingran, "Enterprise Information Mashups: Integrating Information, Simply," Proc. 32nd Int'l Conf. Very Large Data Bases, pp. 3-4, 2006.
[16] G. Wiederhold, "Intelligent Integration of Information," Proc. ACM Int'l Conf. Management of Data (SIGMOD), pp. 434-437, 1993.
[17] R. Agrawal, A. Evfimievski, and R. Srikant, "Information Sharing Across Private Databases," Proc. ACM Int'l Conf. Management of Data (SIGMOD), 2003.
[18] O. Goldreich, Foundations of Cryptography: Vol. II Basic Applications. Cambridge Univ. Press, 2004.
[19] Y. Lindell and B. Pinkas, "Secure Multiparty Computation for Privacy-Preserving Data Mining," J. Privacy and Confidentiality, vol. 1, no. 1, pp. 59-98, 2009.
[20] A.C. Yao, "Protocols for Secure Computations," Proc. 23rd Ann. Symp. Foundations of CS, pp. 160-164, 1982.
[21] C. Clifton, M. Kantarcioglu, J. Vaidya, X. Lin, and M.Y. Zhu, "Tools for Privacy Preserving Distributed Data Mining," ACM SIGKDD Explorations Newsletter, vol. 4, no. 2, pp. 28-34, Dec. 2002.
[22] W. Du and Z. Zhan, "Building Decision Tree Classifier on Private Data," Proc. IEEE Int'l Conf. Privacy, Security and Data Mining (CRPIT '14), Dec. 2002.
[23] Z. Yang, S. Zhong, and R.N. Wright, "Privacy-Preserving Classification of Customer Data without Loss of Accuracy," Proc. Fifth SIAM Int'l Conf. Data Mining, pp. 92-102, 2005.
[24] P. Samarati and L. Sweeney, "Generalizing Data to Provide Anonymity when Disclosing Information," Proc. 17th ACM SIGACT-SIGMOD-SIGART Symp. Principles of Database Systems, p. 188, June 1998.
[25] A. Hundepool and L. Willenborg, "$\mu$ - and $\tau$ -Argus: Software for Statistical Disclosure Control," Proc. Third Int'l Seminar Statistical Confidentiality, 1996.
[26] W. Jiang and C. Clifton, "Privacy-Preserving Distributed $k$ -Anonymity," Proc. 19th Ann. IFIP WG 11.3 Working Conf. Data and Applications Security, pp. 166-177, Aug. 2005.
[27] W. Jiang and C. Clifton, "A Secure Distributed Framework for Achieving $k$ -Anonymity," J. Very Large Data Bases, vol. 15, no. 4, pp. 316-333, Nov. 2006.
[28] N. Mohammed, B.C.M. Fung, K. Wang, and P.C.K. Hung, "Privacy-Preserving Data Mashup," Proc. 12th Int'l Conf. Extending Database Technology (EDBT), pp. 228-239, Mar. 2009.
[29] N. Mohammed, B.C.M. Fung, and M. Debbabi, "Anonymity Meets Game Theory: Secure Data Integration with Malicious Participants," Int'l J. Very Large Data Bases, vol. 20, pp. 567-588, 2011.
[30] T. Trojer, B.C.M. Fung, and P.C.K. Hung, "Service-Oriented Architecture for Privacy-Preserving Data Mashup," Proc. IEEE Seventh Int'l Conf. Web Services, pp. 767-774, July 2009.
[31] P. Jurczyk and L. Xiong, "Privacy-Preserving Data Publishing for Horizontally Partitioned Databases," Proc. 17th ACM Conf. Information and Knowledge Management, Oct. 2008.
[32] P. Jurczyk and L. Xiong, "Distributed Anonymization: Achieving Privacy for Both Data Subjects and Data Providers," Proc. 23rd Ann. IFIP WG 11.3 Working Conf. Data and Applications Security (DBSec), 2009.
[33] C. Jackson and H.J. Wang, "Subspace: Secure Cross-Domain Communication for Web Mashups," Proc. 16th Int'l Conf. World Wide Web, pp. 611-620, 2007.
[34] R.C.W. Wong, J. Li, A.W.C. Fu, and K. Wang, "$(\alpha,k)$ -Anonymity: An Enhanced $k$ -Anonymity Model for Privacy Preserving Data Publishing," Proc. 12th ACM Int'l Conf. Knowledge Discovery and Data Mining (SIGKDD), pp. 754-759, 2006.
[35] C.E. Shannon, "A Mathematical Theory of Communication," The Bell System Technical J., vol. 27, p. 379, p. 623, 1948.
[36] A. Skowron and C. Rauszer, "The Discernibility Matrices and Functions in Information Systems," Intelligent Decision Support: Handbook of Applications and Advances of the Rough Set Theory, 1992.
[37] N. Josuttis, SOA in Practice: The Art of Distributed System Design. O'Reilly Media, Inc., 2007.
[38] H. Kargupta, K. Das, and K. Liu, "Multi-Party, Privacy-Preserving Distributed Data Mining Using a Game Theoretic Framework," Proc. 11th European Conf. Principles and Practice of Knowledge Discovery in Databases (PKDD), pp. 523-531, 2007.
[39] N. Nisan, "Algorithms for Selfish Agents," Proc. 16th Symp. Theoretical Aspects of CS, Mar. 1999.
[40] N. Zhang and W. Zhao, "Distributed Privacy Preserving Information Sharing," Proc. 31st Int'l Conf. Very Large Databases (VLDB), pp. 889-900, 2005.
[41] D.J. Newman, S. Hettich, C.L. Blake, and C.J. Merz, "UCI Repository of Machine Learning Databases," http://archive. ics.uci.eduml, 1998.
[42] P. Samarati and L. Sweeney, "Protecting Privacy When Disclosing Information: $k$ -Anonymity and Its Enforcement through Generalization and Suppression," Technical Report, SRI Int'l, Mar. 1998.
18 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool