This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Practical Data-Oriented Microaggregation for Statistical Disclosure Control
January/February 2002 (vol. 14 no. 1)
pp. 189-201

Abstract—Microaggregation is a statistical disclosure control technique for microdata disseminated in statistical databases. Raw microdata (i.e., individual records or data vectors) are grouped into small aggregates prior to publication. Each aggregate should contain at least $k$ data vectors to prevent disclosure of individual information, where $k$ is a constant value preset by the data protector. No exact polynomial algorithms are known to date to microaggregate optimally, i.e., with minimal variability loss. Methods in the literature rank data and partition them into groups of fixed-size; in the multivariate case, ranking is performed by projecting data vectors onto a single axis. In this paper, candidate optimal solutions to the multivariate and univariate microaggregation problems are characterized. In the univariate case, two heuristics based on hierarchical clustering and genetic algorithms are introduced which are data-oriented in that they try to preserve natural data aggregates. In the multivariate case, fixed-size and hierarchical clustering microaggregation algorithms are presented which do not require data to be projected onto a single dimension; such methods clearly reduce variability loss as compared to conventional multivariate microaggregation on projected data.

[1] N.R. Adam and J.C. Wortmann, “Security-Control Methods for Statistical Databases: A Comparative Study,” ACM Computing Surveys, vol. 21, pp. 515-556, 1989.
[2] M.R. Anderberg, Cluster Analysis for Applications. New York: Academic Press, 1973.
[3] N. Anwar, “Micro-Aggregation—The Small Aggregates Method,” internal report, Luxembourg: Eurostat, 1993.
[4] J.G. Bethlehem, W.J. Keller, and J. Pannekoek, “Disclosure Control of Microdata,” J. Am. Statistical Assoc., vol. 85, pp. 38-45, 1990.
[5] P. Brucker, “On the Complexity of Clustering Problems,” Lecture Notes in Economics and Math. Systems, vol. 157, pp. 45-54, Berlin: Springer-Verlag, 1978.
[6] D. Defays and P. Nanopoulos, “Panels of Enterprises and Confidentiality: The Small Aggregates Method,” Proc. '92 Symp. Design and Analysis of Longitudinal Surveys, pp. 195-204, 1993.
[7] D. Defays and N. Anwar, “Micro-Aggregation: A Generic Method,” Proc. Second Int'l Symp. Statistical Confidentiality, pp. 69-78, 1995.
[8] D.E.R. Denning, Cryptography and Data Security. Addison-Wesley, 1983.
[9] G.T. Duncan and D. Lambert, “Disclosure-Lmited Data Dissemination,” J. Am. Statistical Assoc., vol. 81, pp. 10-28, 1986.
[10] A.W.F. Edwards and L.L. Cavalli-Sforza, “A Method for Cluster Analysis,” Biometrics, vol. 21, pp. 362-375, 1965.
[11] W.D. Fisher, “On Grouping for Maximum Homogeneity,” J. Am. Statistical Assoc., vol. 53, pp. 789-798, 1958.
[12] A.D. Gordon and J.T. Henderson, “An Algorithm for Euclidean Sum of Squares Cassification,” Biometrics, vol. 33, pp. 355-362, 1977.
[13] P. Hansen, B. Jaumard, and N. Mladenovic, “Minimum Sum of Squares Clustering in a Low Dimensional Space,” J. Classification, vol. 15, pp. 37-55, 1998.
[14] J.A Hartigan,Clustering Algorithms, John Wiley and Sons, New York, N.Y., 1975.
[15] J.H. Holland, “Genetic Algorithms and the Optimal Allocation of Trials,” SIAM J. Computing, vol. 2, no. 2, pp. 88-105, 1973.
[16] R.C. Jancey, “Multidimensional Group Analysis,” Australian J. Botany, vol. 14, pp. 127-130, 1966.
[17] J.B. MacQueen, “Some Methods for Classification and Analysis of Multivariate Observations,” Proc. Fifth Berkeley Symp. Math. Statistics and Probability, vol. 1, pp. 281-297, 1967.
[18] J.M. Mateo-Sanz and J. Domingo-Ferrer, “A Method for Data-Oriented Multivariate Microaggregation,” Proc. Statistical Data Protection '98, pp. 89-99, 1999.
[19] J. Mateo-Sanz and J. Domingo-Ferrer, “Microaggregation with Individual Ranking: Cautionary Remarks on Security,” manuscript.
[20] G. Paass, “Disclosure Risk and Disclosure Avoidance for Microdata,” J. Business and Economic Studies, vol. 6, pp. 487-500, 1988.
[21] J.L. Ribeiro Filho and P.C. Treleaven, “Genetic-Algorithm Programming Environments,” Computer, vol. 27, pp. 28-43, 1994.
[22] C.J. Skinner, “On Identification Disclosure and Prediction Disclosure for Microdata,” Statistica Neerlandica, vol. 46, pp. 21-32, 1992.
[23] J.H. Ward, “Hierarchical Grouping to Optimize an Objective Function,” J. Am. Statistical Assoc., vol. 58, pp. 236-244, 1963.
[24] L. Willenborg and T. de Waal, Statistical Disclosure Control in Practice. New York: Springer-Verlag, 1996.

Index Terms:
statistical databases, microdata protection, statistical disclosure control, microaggregation, hierarchical clustering, genetic algorithms
Citation:
J. Domingo-Ferrer, J.M. Mateo-Sanz, "Practical Data-Oriented Microaggregation for Statistical Disclosure Control," IEEE Transactions on Knowledge and Data Engineering, vol. 14, no. 1, pp. 189-201, Jan.-Feb. 2002, doi:10.1109/69.979982
Usage of this product signifies your acceptance of the Terms of Use.