2010 IEEE Fifth International Conference on Networking, Architecture, and Storage (2010)
Macau, China
July 15, 2010 to July 17, 2010
ISBN: 978-0-7695-4134-1
pp: 403-411
Beyond the storage savings brought by chunk-level de-duplication in backup and archiving systems, a prominent challenge facing this technology is how to efficiently and effectively identify the duplicate chunks. Most of the chunk fingerprints used to identify individual chunks are stored on disks due to the limited main memory capacity. Checking for chunk fingerprint match on disk for every input chunk is known to be a severe performance bottleneck for the backup process. On the other hand, our intuitions and analyses of real backup data both indicate that duplicate chunks tend to strongly concentrate according to the data ownership. Motivated by this observation and to avoid or alleviate the aforementioned backup performance bottleneck, we propose DAM, a dataownership-aware multi-layered de-duplication scheme that exploits the data chunks’ ownership and uses a tri-layered de-duplication approach to narrow the search space for duplicate chunks to reduce the total disk accesses. Our experimental results with real world datasets on DAM show it reduces the disk accesses by an average of 60.8% and shortens the de-duplication time by an average of 46.3%.
backup, de-duplication, disk accesses

