20th International Conference on Data Engineering (ICDE'04)
Range CUBE: Efficient Cube Computation by Exploiting Data Correlation
Boston, Massachusetts
March 30-April 02
ISBN: 0-7695-2065-0
Ying Feng, University of California, Santa Barbara
Data cube computation and representation are prohibitively expensive in terms of time and space. Prior work has focused on either reducing the computation time or condensing the representation of a data cube. In this paper, we introduce Range Cubing as an efficient way to compute and compress the data cube without any loss of precision. A new data structure, range trie, is used to compress and identify correlation in attribute values, and compress the input dataset to effectively reduce the computational cost. The range cubing algorithm generates a compressed cube, called range cube, which partitions all cells into disjoint ranges. Each range represents a subset of cells with the same aggregation value, as a tuple which has the same number of dimensions as the input data tuples. The range cube preserves the roll-up/drill-down semantics of a data cube. Compared to H-Cubing, experiments on real dataset show a running time of less than one thirtieth, still generating a range cube of less than one ninth of the space of the full cube, when both algorithms run in their preferred dimension orders. On synthetic data, range cubing demonstrates much better scalability, as well as higher adaptiveness to both data sparsity and skew.
Citation:
Ying Feng, Divyakant Agrawal, Amr El Abbadi, Ahmed Metwally, "Range CUBE: Efficient Cube Computation by Exploiting Data Correlation," icde, pp.658, 20th International Conference on Data Engineering (ICDE'04), 2004