This Article 
 Bibliographic References 
 Add to: 
Minimum Spanning Tree Partitioning Algorithm for Microaggregation
July 2005 (vol. 17 no. 7)
pp. 902-911
This paper presents a clustering algorithm for partitioning a minimum spanning tree with a constraint on minimum group size. The problem is motivated by microaggregation, a disclosure limitation technique in which similar records are aggregated into groups containing a minimum of k records. Heuristic clustering methods are needed since the minimum information loss microaggregation problem is NP-hard. Our MST partitioning algorithm for microaggregation is sufficiently efficient to be practical for large data sets and yields results that are comparable to the best available heuristic methods for microaggregation. For data that contain pronounced clustering effects, our method results in significantly lower information loss. Our algorithm is general enough to accommodate different measures of information loss and can be used for other clustering applications that have a constraint on minimum group size.

[1] N.R. Adam and J.C. Wortmann, “Security Control Methods for Statistical Databases: A Comparative Study,” ACM Computing Surveys, vol. 21, no. 4, pp. 515-556, 1989.
[2] T. Cormen, C. Leiserson, and R Rivest, Introduction to Algorithms, second ed. Cambridge: McGraw Hill, 2001.
[3] R.A. Dandekar, J. Domingo-Ferrer, and F. Sebe, “LHS-Based Hybrid Microdata vs. Rank Swapping and Microaggregation for Numeric Microdata Protection,” Inference Control in Statistical Databases, Lecture Notes in Computer Science 2316, J. Domingo-Ferrer, ed., Springer-Verlag, 2002.
[4] J. Domingo-Ferrer and J.M. Mateo-Sanz, “Practical Data-Oriented Microaggregation for Statistical Disclosure Control,” IEEE Trans. Knowledge and Data Eng., vol. 14, no. 1, pp. 189-201, Jan./Feb. 2002.
[5] J. Domingo-Ferrer and V. Torra, “A Quantitative Comparison of Disclosure Control Methods for Microdata,” Confidentiality, Disclosure, and Data Access: Theory and Practical Application for Statistical Agencies, P. Doyle, J. Lane, J. Theeuwes, and L. Zayatz, eds., pp. 111-133, Amsterdam: North-Holland, 2001.
[6] S.L. Hansen and S. Mukherjee, “A Polynomial Algorithm for Optimal Univariate Microaggregation,” IEEE Trans. Knowledge and Data Eng., vol. 15, no. 4, pp. 1043-1044, July/Aug. 2003.
[7] A.K. Jain, M.N. Murty, and P.J. Flynn, “Data Clustering: A Review,” ACM Computing Surveys, vol. 31, no. 3, 1999.
[8] T. Kanungo, D.M. Mount, N. Netanyahu, C. Piatko, R. Silverman, and A.Y. Wu, “An Efficient k-Means Clustering Algorithm: Analysis and Implementation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, pp. 881-892, 2002.
[9] A. Oganian and J. Domingo-Ferrer, “On the Complexity of Optimal Microaggregation for Statistical Disclosure Control,” Statistical J. United Nations Economic Commission for Europe, vol. 18, no. 4, pp. 345-354, 2001.
[10] R. Sedgewick, Algorithms, second ed., Addison-Wesley, 1988.
[11] J.H. Ward, “Hierarchical Grouping to Maximize an Objective Function,” J. Am. Statistical Assoc., vol. 58, pp. 236-244, 1963.
[12] A. Willenborg and T. De Waal, Elements of Statistical Disclosure Control. New York: Springer-Verlag, 2000.
[13] C.T. Zahn, “Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters,” IEEE Trans. Computers, vol. 20, no. 4, pp. 68-86, Apr. 1971.

Index Terms:
Index Terms- Clustering, partitioning, minimum spanning tree, microdata protection, disclosure control.
Michael Laszlo, Sumitra Mukherjee, "Minimum Spanning Tree Partitioning Algorithm for Microaggregation," IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 7, pp. 902-911, July 2005, doi:10.1109/TKDE.2005.112
Usage of this product signifies your acceptance of the Terms of Use.