2010 39th International Conference on Parallel Processing (2010)
San Diego, CA, USA
Sept. 13, 2010 to Sept. 16, 2010
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICPP.2010.69
Existing de-duplication solutions in cloud backup environment either obtain high compression ratios at the cost of heavy de-duplication overheads in terms of increased latency and reduced throughput, or maintain small de-duplication overheads at the cost of low compression ratios causing high data transmission costs, which results in a large backup window. In this paper, we present SAM, a Semantic-Aware Multitiered source de-duplication framework that first combines the global file-level de-duplication and local chunk-level deduplication, and further exploits file semantics in each stage in the framework, to obtain an optimal tradeoff between the deduplication efficiency and de-duplication overhead and finally achieve a shorter backup window than existing approaches. Our experimental results with real world datasets show that SAM not only has a higher de-duplication efficiency/overhead ratio than existing solutions, but also shortens the backup window by an average of 38.7%.
Cloud Backup, Backup Window, Data Deduplication, File Semantics
Y. Tan, D. Feng, H. Jiang, Z. Yan, G. Zhou and L. Tian, "SAM: A Semantic-Aware Multi-tiered Source De-duplication Framework for Cloud Backup," 2010 39th International Conference on Parallel Processing(ICPP), San Diego, CA, USA, 2010, pp. 614-623.