19th International Conference on Scientific and Statistical Database Management (SSDBM 2007) Adaptive-Size Reservoir Sampling over Data Streams Banff, Alberta, Canada July 09-July 11 ISBN: 0-7695-2868-6
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/SSDBM.2007.29
Reservoir sampling is a well-known technique for sequential random sampling over data streams. Conventional reservoir sampling assumes a fixed-size reservoir. There are situations, however, in which it is necessary and/or advantageous to adaptively adjust the size of a reservoir in the middle of sampling due to changes in data characteristics and/or application behavior. This paper studies adaptivesize reservoir sampling over data streams considering two main factors: reservoir size and sample uniformity. First, the paper conducts a theoretical study on the effects of adjusting the size of a reservoir while sampling is in progress. The theoretical results show that such an adjustment may bring a negative impact on the probability of the sample being uniform (called uniformity confidence herein). Second, the paper presents a novel algorithm for maintaining the reservoir sample after the reservoir size is adjusted such that the resulting uniformity confidence exceeds a given threshold. Third, the paper extends the proposed algorithm to an adaptive multi-reservoir sampling algorithm for a practical application in which samples are collected from memory-limited wireless sensor networks using a mobile sink. Finally, the paper empirically examines the adaptivity of the multi-reservoir sampling algorithm with regard to reservoir size and sample uniformity using real sensor networks data sets.
Citation:
Mohammed Al-Kateb, Byung Suk Lee, X. Sean Wang, "Adaptive-Size Reservoir Sampling over Data Streams," ssdbm, pp.22, 19th International Conference on Scientific and Statistical Database Management (SSDBM 2007), 2007 Usage of this product signifies your acceptance of the Terms of Use. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||