Subscribe
Issue No.05 - May (2013 vol.25)
pp: 1191-1195
Costas Panagiotakis , University of Crete, Ierapetra
Georgios Tziritas , University of Crete, Heraklion
ABSTRACT
In this paper, we propose an efficient clustering algorithm that has been applied to the microaggregation problem. The goal is to partition $(N)$ given records into clusters, each of them grouping at least $(K)$ records, so that the sum of the within-partition squared error (SSE) is minimized. We propose a successive Group Selection algorithm that approximately solves the microaggregation problem in $(O(N^2 \log N))$ time, based on sequential Minimization of SSE. Experimental results and comparisons to existing methods with similar computation cost on real and synthetic data sets demonstrate the high performance and robustness of the proposed scheme.
INDEX TERMS
Clustering algorithms, GSM, Vegetation, Indexes, Partitioning algorithms, Loss measurement, Minimization, microaggregation, Clustering, partition
CITATION
Costas Panagiotakis, Georgios Tziritas, "Successive Group Selection for Microaggregation", IEEE Transactions on Knowledge & Data Engineering, vol.25, no. 5, pp. 1191-1195, May 2013, doi:10.1109/TKDE.2011.242
REFERENCES
 [1] U. Gupta and N. Ranganathan, "A Game Theoretic Approach for Simultaneous Compaction and Equipartitioning of Spatial Data Sets," IEEE Trans. Knowledge and Data Eng., vol. 22, no. 4, pp. 465-478, Apr. 2010. [2] M. Filippone, F. Camastra, F. Masulli, and S. Rovetta, "A Survey of Kernel and Spectral Methods for Clustering," Pattern Recognition, vol. 41, no. 1, pp. 176-190, 2008. [3] J. Domingo-Ferrer, F. Sebé, and A. Solanas, "A Polynomial-Time Approximation to Optimal Multivariate Microaggregation," Computers Math. Applications, vol. 55, no. 4, pp. 714-732, 2008. [4] G. Aggarwal, T. Feder, K. Kenthapadi, S. Khuller, R. Panigrahy, D. Thomas, and A. Zhu, "Achieving Anonymity via Clustering," Proc. 25th ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems, pp. 153-162, 2006. [5] C.C. Aggarwal and P.S. Yu, "A Condensation Approach to Privacy Preserving Data Mining," Proc. Int'l Conf. Extending in Database Technology (EDBT '04), pp. 183-199, 2004. [6] S.L. Hansen and S. Mukherjee, "A Polynomial Algorithm for Optimal Univariate Microaggregation," IEEE Trans. Knowledge and Data Eng., vol. 15, no. 4, pp. 1043-1044, July/Aug. 2003. [7] A. Oganian and J. Domingo-Ferrer, "On the Complexity of Optimal Microaggregation for Statistical Disclosure Control," Statistical J. United Nations Economic Commission for Europe, vol. 18, no. 4, pp. 345-354, 2001. [8] M. Laszlo and S. Mukherjee, "Approximation Bounds for Minimum Information Loss Microaggregation," IEEE Trans. Knowledge and Data Eng., vol. 21, no. 11, pp. 1643-1647, Nov. 2009. [9] J. Domingo-Ferrer, A. Martinez-Balleste, J.M. Mateo-Sanz, and F. Sebe, "Efficient Multivariate Data-Oriented Microaggregation," VLDB J., vol. 15, no. 4, pp. 355-369, 2006. [10] M. Laszlo and S. Mukherjee, "Minimum Spanning Tree Partitioning Algorithm for Microaggregation," IEEE Trans. Knowledge and Data Eng., vol. 17, no. 7, pp. 902-911, July 2005. [11] J. Domingo-Ferrer and J.M. Mateo-Sanz, "Practical Data-Oriented Microaggregation for Statistical Disclosure Control," IEEE Trans. Knowledge and Data Eng., vol. 14, no. 1, pp. 189-201, Jan./Feb. 2002. [12] C.-C. Chang, Y.-C. Li, and W.-H. Huang, "TFRP: An Efficient Microaggregation Algorithm for Statistical Disclosure Control," J. Systems and Software, vol. 80, no. 11, pp. 1866-1878, 2007. [13] T. Kanungo, D.M. Mount, N.S. Netanyahu, C.D. Piatko, R. Silverman, and A.Y. Wu, "An Efficient k-Means Clustering Algorithm: Analysis and Implementation," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 7, pp. 881-892, July 2002.