The Community for Technology Leaders
Green Image
Issue No. 12 - December (2006 vol. 18)
ISSN: 1041-4347
pp: 1585-1599
Yixin Chen , Computer Science Department, Washington University in St. Louis, One Brookings Dr., St. Louis, MO 63130
Guozhu Dong , Computer Science Department, Wright State University, 3640 Colonel Glenn Hwy., Dayton, OH 45435
Jiawei Han , Computer Science Department, University of Illinois, 202 N. Goodwin St., Urbana, IL 61801
Jian Pei , Computer Science Department, Simon Fraser University, 8888 University Drive, Burnaby, BC Canada V5A 1S6
Benjamin W. Wah , Electrical and Computer Engineering Department, University of Illinois, 1308 Main St., Urbana, IL 61801
Jianyong Wang , Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, China
ABSTRACT
As OLAP engines are widely used to support multidimensional data analysis, it is desirable to support in data cubes advanced statistical measures, such as regression and filtering, in addition to the traditional simple measures such as count and average. Such new measures allow users to model, smooth, and predict the trends and patterns of data. Existing algorithms for simple distributive and algebraic measures are inadequate for efficient computation of statistical measures in a multidimensional space. In this paper, we propose a fundamentally new class of measures, compressible measures, in order to support efficient computation of the statistical models. For compressible measures, we compress each cell into an auxiliary matrix with a size independent of the number of tuples. We can then compute the statistical measures for any data cell from the compressed data of the lower-level cells without accessing the raw data. Time- and space-efficient lossless aggregation formulae are derived for regression and filtering measures. Our analytical and experimental studies show that the resulting system, regression cube, substantially reduces the memory usage and the overall response time for statistical analysis of multidimensional data
INDEX TERMS
data analysis, data compression, data mining, data warehouses, regression analysis
CITATION

Yixin Chen, Guozhu Dong, Jiawei Han, Jian Pei, Benjamin W. Wah and Jianyong Wang, "Regression Cubes with Lossless Compression and Aggregation," in IEEE Transactions on Knowledge & Data Engineering, vol. 18, no. 12, pp. 1585-1599, 2007.
doi:10.1109/TKDE.2006.196
175 ms
(Ver 3.3 (11022016))