The Community for Technology Leaders
RSS Icon
Subscribe
Long Beach, CA, USA
Mar. 1, 2010 to Mar. 6, 2010
ISBN: 978-1-4244-5445-7
pp: 625-636
Robson L. F. Cordeiro , Computer Science Department - ICMC, University of São Paulo, 400 Trabalhador Saocarlense Ave, São Carlos 668, Brazil
Agma J. M. Traina , Computer Science Department - ICMC, University of São Paulo, 400 Trabalhador Saocarlense Ave, São Carlos 668, Brazil
Christos Faloutsos , School of Computer Science, Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh PA 15213, USA
Caetano Traina , Computer Science Department - ICMC, University of São Paulo, 400 Trabalhador Saocarlense Ave, São Carlos 668, Brazil
ABSTRACT
We propose the Multi-resolution Correlation Cluster detection (MrCC), a novel, scalable method to detect correlation clusters able to analyze dimensional data in the range of around 5 to 30 axes. Existing methods typically exhibit super-linear behavior in terms of space or execution time. MrCC employs a novel data structure based on multi-resolution and gains over previous approaches in: (a) it finds clusters that stand out in the data in a statistical sense; (b) it is linear on running time and memory usage regarding number of data points and dimensionality of subspaces where clusters exist; (c) it is linear in memory usage and quasi-linear in running time regarding space dimensionality; and (d) it is accurate, deterministic, robust to noise, does not require stating the number of clusters as input parameter, does not perform distance calculation and is able to detect clusters in subspaces generated by original axes or linear combinations of original axes, including space rotation. We performed experiments on synthetic data ranging from 5 to 30 axes and from 12k to 250k points, and MrCC outperformed in time five of the recent and related work, being in average 10 times faster than the competitors that also presented high accuracy results for every tested dataset. Regarding real data, MrCC found clusters at least 9 times faster than the competitors, increasing their accuracy in up to 34 percent.
CITATION
Robson L. F. Cordeiro, Agma J. M. Traina, Christos Faloutsos, Caetano Traina, "Finding Clusters in subspaces of very large, multi-dimensional datasets", ICDE, 2010, 2013 IEEE 29th International Conference on Data Engineering (ICDE), 2013 IEEE 29th International Conference on Data Engineering (ICDE) 2010, pp. 625-636, doi:10.1109/ICDE.2010.5447924
6 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool