This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Data Cube Materialization and Mining over MapReduce
Oct. 2012 (vol. 24 no. 10)
pp. 1747-1759
Arnab Nandi, The Ohio State University, Columbus
Cong Yu, Google Research, New York
Philip Bohannon, Facebook, Menlo Park
Raghu Ramakrishnan, Microsoft, Redmond
Computing interesting measures for data cubes and subsequent mining of interesting cube groups over massive data sets are critical for many important analyses done in the real world. Previous studies have focused on algebraic measures such as SUM that are amenable to parallel computation and can easily benefit from the recent advancement of parallel computing infrastructure such as MapReduce. Dealing with holistic measures such as TOP-K, however, is nontrivial. In this paper, we detail real-world challenges in cube materialization and mining tasks on web-scale data sets. Specifically, we identify an important subset of holistic measures and introduce MR-Cube, a MapReduce-based framework for efficient cube computation and identification of interesting cube groups on holistic measures. We provide extensive experimental analyses over both real and synthetic data. We demonstrate that, unlike existing techniques which cannot scale to the 100 million tuple mark for our data sets, MR-Cube successfully and efficiently computes cubes with holistic measures over billion-tuple data sets.
Index Terms:
Lattices,Cities and towns,USA Councils,Data mining,Knowledge engineering,Data engineering,Algorithm design and analysis,holistic measures.,Data cube,cube materialization,cube mining,MapReduce
Citation:
Arnab Nandi, Cong Yu, Philip Bohannon, Raghu Ramakrishnan, "Data Cube Materialization and Mining over MapReduce," IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 10, pp. 1747-1759, Oct. 2012, doi:10.1109/TKDE.2011.257
Usage of this product signifies your acceptance of the Terms of Use.