A partition-based approach to support streaming updates over persistent data in an active datawarehouse
Parallel and Distributed Processing Symposium, International (2009)
May 23, 2009 to May 29, 2009
Abhirup Chakraborty , Department of Electrical and Computer Engineering, University of Waterloo, ON, Canada N2L 3G1
Ajit Singh , Department of Electrical and Computer Engineering, University of Waterloo, ON, Canada N2L 3G1
Active warehousing has emerged in order to meet the high user demands for fresh and up-to-date information. Online refreshment of the source updates introduces processing and disk overheads in the implementation of the warehouse transformations. This paper considers a frequently occurring operator in active warehousing which computes the join between a fast, time varying or bursty update stream S and a persistent disk relation R, using a limited memory. Such a join operation is the crux of a number of common transformations (e.g., surrogate key assignment, duplicate detection etc) in an active data warehouse. We propose a partition-based join algorithm that minimizes the processing overhead, disk overhead and the delay in output tuples. The proposed algorithm exploits the spatio-temporal locality within the update stream, and improves the delays in output tuples by exploiting hot-spots in the range or domain of the joining attributes, and at the same time shares the I/O cost of accessing disk data of relation R over a volume of tuples from update stream S. We present experimental results showing the effectiveness of the proposed algorithm.
Abhirup Chakraborty, Ajit Singh, "A partition-based approach to support streaming updates over persistent data in an active datawarehouse", Parallel and Distributed Processing Symposium, International, vol. 00, no. , pp. 1-11, 2009, doi:10.1109/IPDPS.2009.5161064