loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
International Parallel and Distributed Processing Symposium (IPDPS'03)
Impact of Data Distribution, Level of Parallelism, and Communication Frequency on Parallel Data Cube Construction
Nice, France
April 22-April 26
ISBN: 0-7695-1926-1
Ge Yang, Ohio State University
Ruoming Jin, Ohio State University
Gagan Agrawal, Ohio State University

Data cube construction is a commonly used operation in data warehouses. Because of the volume of data that is stored and analyzed in a data warehouse and the amount of computation involved in data cube construction, it is natural to consider parallel machines for this operation.

We have developed a set of parallel algorithms for data cube construction using a new data structure called aggregation tree. Our experience has shown that a number of performance tradeoffs arise in developing a parallel data cube implementation. In this paper, we focus on three important issues, which are: 1) Data distribution, i.e., how the original array is distributed among the processors, 2) Level of parallelism, i.e., what parts of the computation are parallelized and sequentialized, and 3) Frequency of communication, i.e., does the implementation require frequent interprocessor communication (and less memory) or less frequent communication (and more memory).

We present a detailed experimental study evaluating the above tradeoffs. We consider parallel data cube construction with different cube sizes and sparsity levels. Our experimental results show the following: 1) In all cases, reducing the frequency of communication and using higher memory gave better performance, though the difference was relatively small. 2) Choosing data distribution to minimize communication volume made a substantial difference in the performance in most of the cases. 3) Finally, using parallelism at all levels gave better performance, even though it increases the total communication volume.

Citation:
Ge Yang, Ruoming Jin, Gagan Agrawal, "Impact of Data Distribution, Level of Parallelism, and Communication Frequency on Parallel Data Cube Construction," ipdps, pp.66a, International Parallel and Distributed Processing Symposium (IPDPS'03), 2003
Usage of this product signifies your acceptance of the Terms of Use.