Dec. 6, 2009 to Dec. 9, 2009
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDM.2009.12
This work examines under what conditions compression methodologies can retain the outcome of clustering operations. We focus on the popular k-Means clustering algorithm and we demonstrate how a properly constructed compression scheme based on post-clustering quantization is capable of maintaining the global cluster structure. Our analytical derivations indicate that a 1-bit moment preserving quantizer per cluster is sufficient to retain the original data clusters. Merits of the proposed compression technique include: a) reduced storage requirements with clustering guarantees, b) data privacy on the original values, and c) shape preservation for data visualization purposes. We evaluate quantization scheme on various high-dimensional datasets, including 1-dimensional and 2-dimensional time-series (shape datasets) and demonstrate the cluster preservation property. We also compare with previously proposed simplification techniques in the time-series area and show significant improvements both on the clustering and shape preservation of the compressed datasets.
moment preserving quantization, privacy preservation, clustering preservation
Deepak S. Turaga, Michail Vlachos, Olivier Verscheure, "On K-Means Cluster Preservation Using Quantization Schemes", ICDM, 2009, 2013 IEEE 13th International Conference on Data Mining, 2013 IEEE 13th International Conference on Data Mining 2009, pp. 533-542, doi:10.1109/ICDM.2009.12