2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS) (2016)
Wuhan, Hubei, China
Dec. 13, 2016 to Dec. 16, 2016
Data compression is widely used in storage systems to reduce redundant data and thus save storage space. One challenge facing the traditional compression approaches is the limitation of compression windows size, which fails to reduce redundancy globally. In this paper, we present DEC, a Deduplication-Enhanced Compression approach that effectively combines deduplication and traditional compressors to increase compression ratio and efficiency. Specifically, we make full use of deduplication to (1) accelerate data reduction by fast but global deduplication and (2) exploit data locality to compress similar chunks by clustering the data chunks which are adjacent to the same duplicate chunks. Our experimental results of a DEC prototype based on real-world datasets show that DEC increases the compression ratio by 20% to 71% and speeds up the compression throughput by 17%~183% compared to traditional compressors, without sacrificing the decompression throughput by leveraging deduplication in traditional compression approaches.
Redundancy, Compressors, Throughput, Scalability, Prototypes, Indexing
Zijin Han, Wen Xia, Yuchong Hu, Dan Feng, Yucheng Zhang, Yukun Zhou, Min Fu, Liang Gu, "DEC: An Efficient Deduplication-Enhanced Compression Approach", 2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS), vol. 00, no. , pp. 519-526, 2016, doi:10.1109/ICPADS.2016.0075