2014 Second International Symposium on Computing and Networking (CANDAR) (2014)
Dec. 10, 2014 to Dec. 12, 2014
Data deduplication technology usually identifies redundant data quickly and correctly by using bloom filter technology. A bloom filter can determine whether there is redundant data. However, there are the presences of false positives. In order to avoid false positives, we need to compare a new chunk with chunks that have been stored. In order to reduce the time to exclude the bloom filter false positives, current research uses many small size index tables to store chunk ID. However, the target chunk ID only stores in one index table. Searching for the target chunk ID at another index table uselessly took a great deal of time. In this paper, we cluster the stored chunks to reduce the time of excluding the false positive problem induced by bloom filter.
Indexes, Arrays, Multimedia communication, Linux, Kernel, Cloud computing, System performance
C. Tseng, J. Ciou and T. Liu, "A Cluster-Based Data De-duplication Technology," 2014 Second International Symposium on Computing and Networking (CANDAR), Shizuoka, Japan, 2014, pp. 226-230.