IEEE Transactions on Cloud Computing

From the April-June 2015 issue

FastRAQ: A Fast Approach to Range-Aggregate Queries in Big Data Environments

By Xiaochun Yun, Guangjun Wu, Guangyan Zhang, Keqin Li, and Shupeng Wang

Featured article thumbnail imageRange-aggregate queries are to apply a certain aggregate function on all tuples within given query ranges. Existing approaches to range-aggregate queries are insufficient to quickly provide accurate results in big data environments. In this paper, we propose FastRAQ—a fast approach to range-aggregate queries in big data environments. FastRAQ first divides big data into independent partitions with a balanced partitioning algorithm, and then generates a local estimation sketch for each partition. When a range-aggregate query request arrives, FastRAQ obtains the result directly by summarizing local estimates from all partitions. FastRAQ has $O(1)$ time complexity for data updates and $O(\frac{N}{P\times {B}})$ time complexity for range-aggregate queries, where $N$ is the number of distinct tuples for all dimensions, $P$ is the partition number, and $B$ is the bucket number in the histogram. We implement the FastRAQ approach on the Linux platform, and evaluate its performance with about 10 billions data records. Experimental results demonstrate that FastRAQ provides range-aggregate query results within a time period two orders of magnitude lower than that of Hive, while the relative error is less than 3 percent within the given confidence interval.

NOTE: We seek submission of papers that present new, original and innovative ideas for the "first" time in TCC (Transactions on Cloud Computing). That means, submission of "extended versions" of already published works (e.g., conference/workshop papers) is not encouraged unless they contain significant number of "new and original" ideas/contributions along with more than 65% brand "new" material. If you are submitting an extended version, you SHOULD submit a cover letter/document detailing (1) the "Summary of Differences" between TCC paper and earlier paper, (2) a clear listing of "new and original" ideas/contributions in TCC paper (identifying sections where they are proposed/presented), and (3) confirming the percentage of new material. Otherwise, submission will be "desk" rejected without any reviews.

