$S$ of objects such that, given an interval $I$ , a query counts how many objects of $S$ are covered by $I$ . Besides COUNT, the problem can also be defined with other aggregate functions, e.g., SUM, MIN, MAX and AVERAGE. This paper studies a novel variant of range aggregation, where an object can belong to multiple sets. A query (at runtime) picks any two sets, and aggregates on their intersection. More formally, let $S_{1},\ldots, S_{m}$ be $m$ sets of objects. Given distinct set ids $i$ , $j$ and an interval $I$ , a query reports how many objects in $S_{i}\mathop{\rm\cap\kern 0pt}\displaylimits S_{j}$ are covered by $I$ . We call this problem range aggregation with set selection (RASS). Its hardness lies in that the pair $(i, j)$ can have ${m\choose 2}$ choices, rendering effective indexing a non-trivial task. The RASS problem can also be defined with other aggregate functions, and generalized so that a query chooses more than 2 sets. We develop a system called RASS to power this type of queries. Our system has excellent efficiency in both theory and practice. Theoretically, it consumes linear space, and achieves nearly-optimal query time. Practically, it outperforms existing solutions on real datasets by a factor up to an order of magnitude. The paper also features a rigorous theoretical analysis on the hardness of the RASS problem, which reveals invaluable insight into its characteristics." /> $S$ of objects such that, given an interval $I$ , a query counts how many objects of $S$ are covered by $I$ . Besides COUNT, the problem can also be defined with other aggregate functions, e.g., SUM, MIN, MAX and AVERAGE. This paper studies a novel variant of range aggregation, where an object can belong to multiple sets. A query (at runtime) picks any two sets, and aggregates on their intersection. More formally, let $S_{1},\ldots, S_{m}$ be $m$ sets of objects. Given distinct set ids $i$ , $j$ and an interval $I$ , a query reports how many objects in $S_{i}\mathop{\rm\cap\kern 0pt}\displaylimits S_{j}$ are covered by $I$ . We call this problem range aggregation with set selection (RASS). Its hardness lies in that the pair $(i, j)$ can have ${m\choose 2}$ choices, rendering effective indexing a non-trivial task. The RASS problem can also be defined with other aggregate functions, and generalized so that a query chooses more than 2 sets. We develop a system called RASS to power this type of queries. Our system has excellent efficiency in both theory and practice. Theoretically, it consumes linear space, and achieves nearly-optimal query time. Practically, it outperforms existing solutions on real datasets by a factor up to an order of magnitude. The paper also features a rigorous theoretical analysis on the hardness of the RASS problem, which reveals invaluable insight into its characteristics." /> $S$ of objects such that, given an interval $I$ , a query counts how many objects of $S$ are covered by $I$ . Besides COUNT, the problem can also be defined with other aggregate functions, e.g., SUM, MIN, MAX and AVERAGE. This paper studies a novel variant of range aggregation, where an object can belong to multiple sets. A query (at runtime) picks any two sets, and aggregates on their intersection. More formally, let $S_{1},\ldots, S_{m}$ be $m$ sets of objects. Given distinct set ids $i$ , $j$ and an interval $I$ , a query reports how many objects in $S_{i}\mathop{\rm\cap\kern 0pt}\displaylimits S_{j}$ are covered by $I$ . We call this problem range aggregation with set selection (RASS). Its hardness lies in that the pair $(i, j)$ can have ${m\choose 2}$ choices, rendering effective indexing a non-trivial task. The RASS problem can also be defined with other aggregate functions, and generalized so that a query chooses more than 2 sets. We develop a system called RASS to power this type of queries. Our system has excellent efficiency in both theory and practice. Theoretically, it consumes linear space, and achieves nearly-optimal query time. Practically, it outperforms existing solutions on real datasets by a factor up to an order of magnitude. The paper also features a rigorous theoretical analysis on the hardness of the RASS problem, which reveals invaluable insight into its characteristics." /> Range Aggregation With Set Selection
Subscribe
Issue No.05 - May (2014 vol.26)
pp: 1240-1252
Yufei Tao , Chinese Univ. of Hong Kong, Hong Kong, China
Cheng Sheng , Google, Zürich, Switzerland
Chin-Wan Chung , Korea Adv. Inst. of Sci. & Technol., Daejeon, South Korea
Jong-Ryul Lee , Korea Adv. Inst. of Sci. & Technol., Daejeon, South Korea
ABSTRACT
In the classic range aggregation problem, we have a set S of objects such that, given an interval I, a query counts how many objects of S are covered by I. Besides COUNT, the problem can also be defined with other aggregate functions, e.g., SUM, MIN, MAX and AVERAGE. This paper studies a novel variant of range aggregation, where an object can belong to multiple sets. A query (at runtime) picks any two sets, and aggregates on their intersection. More formally, let S1,...,Sm be m sets of objects. Given distinct set ids i, j and an interval I, a query reports how many objects in Si ∩ Sj are covered by I. We call this problem range aggregation with set selection (RASS). Its hardness lies in that the pair (i, j) can have (2m) choices, rendering effective indexing a non-trivial task. 2 The RASS problem can also be defined with other aggregate functions, and generalized so that a query chooses more than 2 sets. We develop a system called RASS to power this type of queries. Our system has excellent efficiency in both theory and practice. Theoretically, it consumes linear space, and achieves nearly-optimal query time. Practically, it outperforms existing solutions on real datasets by a factor up to an order of magnitude. The paper also features a rigorous theoretical analysis on the hardness of the RASS problem, which reveals invaluable insight into its characteristics.
INDEX TERMS
set theory, indexing, query processing, rendering (computer graphics),linear space, query counts, aggregate functions, range aggregation with set selection, RASS problem, rendering, indexing, nearly optimal query time,Silicon, Aggregates, Arrays, Facebook, Aging, Indexing,Theory, Range Aggregation, Index,theory, Range aggregation, index
CITATION
Yufei Tao, Cheng Sheng, Chin-Wan Chung, Jong-Ryul Lee, "Range Aggregation With Set Selection", IEEE Transactions on Knowledge & Data Engineering, vol.26, no. 5, pp. 1240-1252, May 2014, doi:10.1109/TKDE.2013.125