Large Data warehouses (DW) put a major challenge in what concerns performance and scalability, as users request instant answers to their queries. Traditional solutions relying on very expensive architectures and structures cannot turn every complex aggregation query into minutes or seconds answers. The summary warehouse (SW) achieves such a speedup using only general-purpose sampling summaries well-fit for aggregated exploration analysis.
The major limitation of SWs results from the tradeoff between accuracy and speed: smaller, faster summaries cannot answer less-aggregated queries. We propose a simple and cheap strategy to meet these conflicting requirements and deliver unseen speedup by taking advantage of distributed computation ubiquity. The distributed summaries approach (DS) proposed in this paper manages a distributed set of summaries that are put in available computing nodes of a local area network to achieve very fast query processing, while guaranteeing enough accuracy.