The Community for Technology Leaders
Green Image
Issue No. 03 - March (2016 vol. 27)
ISSN: 1045-9219
pp: 855-868
Min Fu , Wuhan National Laboratory for Optoelectronics, School of Computer Science and Technology, Huazhong University of Science and Technology, Division of Data Storage System, Wuhan, China
Dan Feng , Wuhan National Laboratory for Optoelectronics, School of Computer Science and Technology, Huazhong University of Science and Technology, Division of Data Storage System, Wuhan, China
Yu Hua , Wuhan National Laboratory for Optoelectronics, School of Computer Science and Technology, Huazhong University of Science and Technology, Division of Data Storage System, Wuhan, China
Xubin He , Department of Electrical and Computer Engineering, Virginia Commonwealth University, Richmond, VA, USA
Zuoning Chen , National Engineering Research Center for Parallel Computer, Beijing, China
Jingning Liu , Wuhan National Laboratory for Optoelectronics, School of Computer Science and Technology, Huazhong University of Science and Technology, Division of Data Storage System, Wuhan, China
Wen Xia , Wuhan National Laboratory for Optoelectronics, School of Computer Science and Technology, Huazhong University of Science and Technology, Division of Data Storage System, Wuhan, China
Fangting Huang , Wuhan National Laboratory for Optoelectronics, School of Computer Science and Technology, Huazhong University of Science and Technology, Division of Data Storage System, Wuhan, China
Qing Liu , Wuhan National Laboratory for Optoelectronics, School of Computer Science and Technology, Huazhong University of Science and Technology, Division of Data Storage System, Wuhan, China
ABSTRACT
In backup systems, the chunks of each backup are physically scattered after deduplication, which causes a challenging fragmentation problem. We observe that the fragmentation comes into sparse and out-of-order containers. The sparse container decreases restore performance and garbage collection efficiency, while the out-of-order container decreases restore performance if the restore cache is small. In order to reduce the fragmentation, we propose History-Aware Rewriting algorithm (HAR) and Cache-Aware Filter (CAF). HAR exploits historical information in backup systems to accurately identify and reduce sparse containers, and CAF exploits restore cache knowledge to identify the out-of-order containers that hurt restore performance. CAF efficiently complements HAR in datasets where out-of-order containers are dominant. To reduce the metadata overhead of the garbage collection, we further propose a Container-Marker Algorithm (CMA) to identify valid containers instead of valid chunks. Our extensive experimental results from real-world datasets show HAR significantly improves the restore performance by 2.84-175.36 $\times$ at a cost of only rewriting 0.5-2.03 percent data.
INDEX TERMS
Containers, Out of order, Image restoration, Merging, Distributed databases, Indexes, Prefetching,performance evaluation, Data deduplication, storage system, chunk fragmentation,performance evaluation, Data deduplication, storage system, chunk fragmentation
CITATION
Min Fu, Dan Feng, Yu Hua, Xubin He, Zuoning Chen, Jingning Liu, Wen Xia, Fangting Huang, Qing Liu, "Reducing Fragmentation for In-line Deduplication Backup Storage via Exploiting Backup History and Cache Knowledge", IEEE Transactions on Parallel & Distributed Systems, vol. 27, no. , pp. 855-868, March 2016, doi:10.1109/TPDS.2015.2410781
196 ms
(Ver 3.3 (11022016))