Data cube construction is a commonly used operation in data warehouses. Because of the volume of data that is stored and analyzed in a data warehouse and the amount of computation involved in data cube construction, it is natural to consider parallel machines for this operation.
We have developed a set of parallel algorithms for data cube construction using a new data structure called aggregation tree. Our experience has shown that a number of performance tradeoffs arise in developing a parallel data cube implementation. In this paper, we focus on three important issues, which are: 1) Data distribution, i.e., how the original array is distributed among the processors, 2) Level of parallelism, i.e., what parts of the computation are parallelized and sequentialized, and 3) Frequency of communication, i.e., does the implementation require frequent interprocessor communication (and less memory) or less frequent communication (and more memory).
We present a detailed experimental study evaluating the above tradeoffs. We consider parallel data cube construction with different cube sizes and sparsity levels. Our experimental results show the following: 1) In all cases, reducing the frequency of communication and using higher memory gave better performance, though the difference was relatively small. 2) Choosing data distribution to minimize communication volume made a substantial difference in the performance in most of the cases. 3) Finally, using parallelism at all levels gave better performance, even though it increases the total communication volume.