Issue No.08 - August (2011 vol.22)
Bin Tang , Wichita State University, Wichita
Liqiang Wang , University of Wyoming, Laramie
Dharma Teja Nukarapu , Wichita State University, Wichita
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TPDS.2010.207
Data replication has been well adopted in data intensive scientific applications to reduce data file transfer time and bandwidth consumption. However, the problem of data replication in Data Grids, an enabling technology for data intensive applications, has proven to be NP-hard and even non approximable, making this problem difficult to solve. Meanwhile, most of the previous research in this field is either theoretical investigation without practical consideration, or heuristics-based with little or no theoretical performance guarantee. In this paper, we propose a data replication algorithm that not only has a provable theoretical performance guarantee, but also can be implemented in a distributed and practical manner. Specifically, we design a polynomial time centralized replication algorithm that reduces the total data file access delay by at least half of that reduced by the optimal replication solution. Based on this centralized algorithm, we also design a distributed caching algorithm, which can be easily adopted in a distributed environment such as Data Grids. Extensive simulations are performed to validate the efficiency of our proposed algorithms. Using our own simulator, we show that our centralized replication algorithm performs comparably to the optimal algorithm and other intuitive heuristics under different network parameters. Using GridSim, a popular distributed Grid simulator, we demonstrate that the distributed caching technique significantly outperforms an existing popular file caching technique in Data Grids, and it is more scalable and adaptive to the dynamic change of file access patterns in Data Grids.
Data intensive applications, Data Grids, data replication, algorithm design and analysis, simulations.
Bin Tang, Liqiang Wang, Dharma Teja Nukarapu, "Data Replication in Data Intensive Scientific Applications with Performance Guarantee", IEEE Transactions on Parallel & Distributed Systems, vol.22, no. 8, pp. 1299-1306, August 2011, doi:10.1109/TPDS.2010.207