2011 IEEE Sixth International Conference on Networking, Architecture, and Storage (2011)
Dalian, Lianong China
July 28, 2011 to July 30, 2011
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/NAS.2011.37
Detecting duplicates over sliding windows is an important technique for monitoring and analysing data streams. Since recording the exact information of elements in a sliding window can be RAM-resource-intensive and introduce an unacceptable search complexity, several approximate membership representation schemes have been proposed to build in-memory fast indices. However, various challenges facing RAM utilization and scalability remain. This paper proposes a Detached Counting Bloom filter Array (DCBA) to flexibly and efficiently detect duplicates over sliding windows. A DCBA consists of an array of detached counting Bloom filters (DCBFs), where each DCBF is essentially a Bloom filter that is associated with a detached timer (counter) array. The DCBA scheme functions as a circular FIFO queue and keeps a filling DCBF for accommodating fresh elements and a decaying DCBF for evicting stale elements. DCBA allows the timer arrays belonging to fully filled DCBFs to be offloaded to disks to greatly improve the memory space efficiency. The fully filled DCBFs will remain stable until their elements become stale, which allows a DCBA to be efficiently replicated for the purpose of data reliability or information sharing. Further, DCBA can be cooperatively maintained by clustered nodes, which provides scalable solution for mining massive data streams. Mathematical analysis and experimental results show that a DCBA (containing 64 DCBFs) requires less than 10% of its components to be kept in RAM while maintaining more than 95% of its query performance, which significantly outperforms existing schemes in memory efficiency and scalability.
duplicate detection, sliding window, data stream, Bloom filter
D. Feng, J. Wei, K. Zhou, H. Jiang and H. Wang, "Detecting Duplicates over Sliding Windows with RAM-Efficient Detached Counting Bloom Filter Arrays," 2011 IEEE Sixth International Conference on Networking, Architecture, and Storage(NAS), Dalian, Lianong China, 2011, pp. 382-391.